{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## 使用CRF实现中文命名实体识别（NER）\n",
    "\n",
    "目录\n",
    "\n",
    "* [数据集说明](#数据集说明)\n",
    "* [BiLSTM-CRF实现说明](#BiLSTM-CRF实现说明)\n",
    "* [实验环境说明](#实验环境说明)\n",
    "* [1.配置](#1.配置)\n",
    "* [2.加载数据](#2.加载数据）)\n",
    "* [3.搭建模型](#3.搭建模型)\n",
    "* [4.模型训练&保存模型](#4.模型训练&保存模型)\n",
    "* [5.测试](#5.测试)\n",
    "* [6.预测](#6.预测)\n",
    "\n",
    "## 数据集说明\n",
    "\n",
    "数据集：人民日报 中文 NER数据集\n",
    "\n",
    "数据集说明：实体为人名（PER）、地名（LOC）和组织机构名（ORG）。数据集一行对应一个中文字符以及标注符号，标注系统采用BIO系统。\n",
    "\n",
    "其数据格式如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "海 O\n",
    "钓 O\n",
    "比 O\n",
    "赛 O\n",
    "地 O\n",
    "点 O\n",
    "在 O\n",
    "厦 B-LOC\n",
    "门 I-LOC\n",
    "与 O\n",
    "金 B-LOC\n",
    "门 I-LOC\n",
    "之 O\n",
    "间 O\n",
    "的 O\n",
    "海 O\n",
    "域 O\n",
    "。 O"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## BiLSTM-CRF实现说明\n",
    "\n",
    "LSTM 一般已经足以用于词性标注、NER等任务，但是再与CRF结合后，能在CRF融合上下文局部特征的优势加持下，使得序列模型对于NER上的性能能有不错的提升。\n",
    "\n",
    "当已知观测序列$x$和状态序列$y$后，CRF计算条件概率公式为：\n",
    "\n",
    "$$P(y|x)=\\frac{\\exp \\{ score(x,y)\\}}{\\sum\\limits_{y'}\\exp \\{ score(x,y)\\}}$$\n",
    "\n",
    "而$score(x,y)$如何计算呢？即就是序列的状态特征得分与转移特征得分的和：\n",
    "$$\\begin{aligned}score(x, y)&=transition\\_score+emission\\_score\\\\&=\\sum\\limits_{i=1}^{n+1}A_{y_{i-1},y_i}+\\sum\\limits_{i=1}^{n+1}P_{y_i,x_i}\\end{aligned}$$\n",
    "\n",
    "其中$P_{y_i,x_i}$表示标签tag 为$y_i$ 条件下，观测word为$x_i$的emission_score，其来自时间步长$i$处的BiLSTM的隐藏状态。\n",
    "\n",
    "$A_{y_{i-1},y_i}$表示从标签tag $y_{i-1}$转移到标签tag $y_{i}$的transition_score，transition_score存储在维度为tag_num的转移矩阵中，它将作为模型参数的一部分，\n",
    "在训练中学习得到。\n",
    "\n",
    "## 实验环境说明\n",
    "\n",
    "|环境 | 版本/型号|\n",
    "---|---\n",
    "python| 3.6.9\n",
    "pytorch| 1.7.0\n",
    "cuda | 10.2\n",
    "gpu| NVIDIA V100 (32G) x 4张"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 1.配置"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "outputs": [],
   "source": [
    "import torch\n",
    "import torchkeras\n",
    "from torchcrf import CRF\n",
    "\n",
    "from tqdm import tqdm\n",
    "import datetime\n",
    "import time\n",
    "import copy\n",
    "from matplotlib import pyplot as plt\n",
    "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, \\\n",
    "            confusion_matrix, classification_report\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "cwd_dir = '/home/xijian/pycharm_projects/Magic-NLPer/MachineLearning/CRF条件随机场/'\n",
    "data_base_dir = cwd_dir + 'data/rmrb/'\n",
    "save_dir = cwd_dir + 'save/'\n",
    "imgs_dir = cwd_dir + 'imgs/'\n",
    "\n",
    "pad_token = '<pad>'\n",
    "pad_id = 0\n",
    "unk_token = '<unk>'\n",
    "unk_id = 1\n",
    "\n",
    "tag_to_id = {'<pad>': 0, 'O': 1, 'B-LOC': 2, 'I-LOC': 3, 'B-PER': 4, 'I-PER': 5, 'B-ORG': 6, 'I-ORG': 7}\n",
    "id_to_tag = {id: tag for tag, id in tag_to_id.items()}\n",
    "word_to_id = {'<pad>': 0, '<unk>': 1}\n",
    "tags_num = len(tag_to_id)\n",
    "\n",
    "LR = 1e-3\n",
    "EPOCHS = 30\n",
    "\n",
    "maxlen = 60\n",
    "# total_words = 4000\n",
    "\n",
    "embedding_dim = 100\n",
    "hidden_size = 128\n",
    "batch_size = 512"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 2.加载数据"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [],
   "source": [
    "# 读取数据  数据格式：字 tag\n",
    "def read_data(filepath):\n",
    "    sentences = []\n",
    "    tags = []\n",
    "    with open(filepath, 'r', encoding='utf-8') as f:\n",
    "        tmp_sentence = []\n",
    "        tmp_tags = []\n",
    "        for line in f:\n",
    "            if line == '\\n' and len(tmp_sentence) != 0:\n",
    "                assert len(tmp_sentence) == len(tmp_tags)\n",
    "                sentences.append(tmp_sentence)\n",
    "                tags.append(tmp_tags)\n",
    "                tmp_sentence = []\n",
    "                tmp_tags = []\n",
    "            else:\n",
    "                line = line.strip().split(' ')\n",
    "                tmp_sentence.append(line[0])\n",
    "                tmp_tags.append(line[1])\n",
    "        if len(tmp_sentence) != 0:\n",
    "            assert len(tmp_sentence) == len(tmp_tags)\n",
    "            sentences.append(tmp_sentence)\n",
    "            tags.append(tmp_tags)\n",
    "    return sentences, tags"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "查看和分析一下数据，有助于设置一些超参，例如vocab_size、max_length等等"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['海', '钓', '比', '赛', '地', '点', '在', '厦', '门', '与', '金', '门', '之', '间', '的', '海', '域', '。'] ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC', 'I-LOC', 'O', 'B-LOC', 'I-LOC', 'O', 'O', 'O', 'O', 'O', 'O']\n",
      "最大句子长度：574, 最小句子长度：6, 平均句子长度：46.93, 句子长度中位数：40.00\n",
      "              s_len\n",
      "count  20864.000000\n",
      "mean      46.931557\n",
      "std       30.077038\n",
      "min        6.000000\n",
      "25%       28.000000\n",
      "50%       40.000000\n",
      "75%       58.000000\n",
      "max      574.000000\n"
     ]
    }
   ],
   "source": [
    "sentences, tags = read_data(data_base_dir + 'train.txt')\n",
    "print(sentences[0], tags[0])\n",
    "\n",
    "s_lengths = [len(s) for s in sentences]\n",
    "print('最大句子长度：{}, 最小句子长度：{}, 平均句子长度：{:.2f}, 句子长度中位数：{:.2f}'.format(\n",
    "    max(s_lengths), min(s_lengths), np.mean(s_lengths), np.median(s_lengths)))\n",
    "df_len = pd.DataFrame({'s_len': s_lengths})\n",
    "print(df_len.describe())"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "句子长度分布图：\n"
     ]
    },
    {
     "data": {
      "text/plain": "<AxesSubplot:title={'center':'sentence length '}, ylabel='Frequency'>"
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": "<Figure size 432x288 with 1 Axes>",
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAEICAYAAACavRnhAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcyElEQVR4nO3dfZAddZ3v8feHJBJIYp7AkU2QifK0YHhyhFiw6wAXwoMIPiySwiVYWbO3btyrtZQSXC6yCFayhSDelSyBsARcBXRFsuACMTKr1hYSHqJAApdRgkyCBJIQMtEEJnzvH/2b0ISZTE96zsz0mc+r6tR0//rXv/P9JqfmO/3rPt2KCMzMzMrYY6ADMDOz6nMxMTOz0lxMzMysNBcTMzMrzcXEzMxKczExM7PSXEzMBiFJIenAAXjfZklt/f2+Vn0uJjakSbpc0ncHOo6BMlBFy+qPi4mZmZXmYmKVIeliSWskbZb0jKSTU/sekuZK+q2k9ZLulDQhbWtMf33PlPR7Sa9I+oe07TTgq8BnJLVL+nVqHytpkaQX0/tdKWlY2nahpF9KulrSRknPSTo9F+MESf8qaW3a/uPcto9JWiHpVUn/LemIgnnvmd7v95JekvQvkvZK25oltUm6SNK6FPPncvtOlPQfkl6TtDzl8su07eep269T/p/J7dfleGbdcTGxSpB0CPAF4MMRMQaYDqxOm/8OOAf4KPBnwEbgOzsNcQJwCHAycJmkP4+I+4BvAHdExOiIODL1vQXoAA4EjgZOBf4mN9ZxwDPAPsA/AYskKW27DdgbOBx4D3Btiv9o4Gbgb4GJwA3AEkl7Fkh/HnAwcFSKaRJwWW77e4GxqX0W8B1J49O27wBbUp+Z6QVARPxlWjwy5X9HgfHMuhYRfvk16F9kv0TXAf8DGLHTtlXAybn1/YA3gOFAIxDA5Nz2h4Hz0vLlwHdz2xqAbcBeubYZwINp+UKgNbdt7zT+e9P7vgmM7yL+BcDXd2p7BvhoN/lGyllkxeADuW0fAZ5Ly83An4Dhue3rgGnAsPTvcEhu25XAL3d+n9x6t+MN9GfAr8H9Gt7r6mM2ACKiVdKXyH75Hy7pfuDvI2ItcABwl6Q3c7tsJysMnf6QW/4jMLqbtzoAGAG8+NbBBnsAL3Q1VkT8MfUbDUwANkTExm7GnSnp73Jt7yI7ktqVfckK1qO5eERWKDqtj4iO3HpnfvuSFdR87Pnl7nQ3nlm3PM1llRER34uIE8h+MQcwP216ATg9IsblXiMjYk2RYXdaf4HsyGSf3FjvjojDC4z1AjBB0rhutl21U4x7R8T3exjzFbIjhcNz+42NiCK/3F8mm66bnGvbv8B+Zr3mYmKVIOkQSSelcwxbyX7Bdh6J/AtwlaQDUt99JZ1dcOiXgEZJewBExIvAA8A3Jb07ndz/gKSP9jRQ2vc/gesljZc0QlLneYkbgf8p6ThlRkk6U9KYHsZ8M+17raT3pPwmSZpeIJ7twI+AyyXtLelQ4IIu8n9/T2OZ9cTFxKpiT7IT0a+QTTO9B7gkbbsOWAI8IGkz8BDZSfIifpB+rpf0WFq+gGwKaiXZyfwfkp0PKeKvyc5TPE12ruFLABHxCPB54J/TmK1k51+KuDj1f0jSa8BPyS4mKOILZCfT/0B2ccD3yY68Ol0OLE5XmJ1bcEyzd1CEH45lNlRImg+8NyJm9tjZrBd8ZGJWxyQdKumINLV2LNmlvncNdFxWf3w1l1l9G0M2tfVnZOdHvgncPaARWV3yNJeZmZXmaS4zMyutLqe59tlnn2hsbCzcf8uWLYwaNap2AQ2QesyrHnMC51Ul9ZgTZHk9/fTTr0TEvruzf10Wk8bGRh555JHC/VtaWmhubq5dQAOkHvOqx5zAeVVJPeYEWV4nnnji87u7v6e5zMysNBcTMzMrzcXEzMxKq8tzJmZmvfHGG2/Q1tbG1q1be+w7duxYVq1a1Q9R1c7IkSOZPHkyI0aM6LMxXUzMbMhra2tjzJgxNDY2krvVf5c2b97MmDG7vD/noBYRrF+/nra2NqZMmdJn43qay8yGvK1btzJx4sQeC0k9kMTEiRMLHYX1houJmRkMiULSqRa5upiYmVlpNT1nImk1sJnsEaodEdEkaQJwB9mzuVcD50bERmWl8jrgDLLHhF4YEY+lcWYCl6Zhr4yIxbWM28yGtsa59/bpeKvnndmn4w1G/XEC/sSIeCW3PhdYFhHzJM1N6xcDpwMHpddxwALguFR8vgY0kT1i9VFJS7p5zvaA6O6DNxQ+QGbWf5qbm7n66qtpamoa6FDeYSCmuc4GOo8sFgPn5NpvjcxDwDhJ+wHTgaURsSEVkKXAaf0cs5mZ7UKtj0yC7FGqAdwQEQuBhvSsbMgeJdqQlicBL+T2bUtt3bW/jaTZwGyAhoYGWlpaCgfZ3t7eq/47u2hqR5ftZcbsC2XzGozqMSdwXgNt7NixbN68uWbj9zT2li1bmDlzJmvXrmX79u185Stf4VOf+tQ7+m3fvp0tW7awefNmli1bxje+8Q1ef/11pkyZwvXXX8/o0aP54Ac/yIwZM7jvvvt44403uPXWWzn44IPfMdbWrVvf9n/T3t5eKsdaF5MTImKNpPcASyU9nd8YEZEKTWmpUC0EaGpqit7ciK3sjdsu7G6a6/zdH7Mv1OMN6eoxJ3BeA23VqlU1/e5IT2M/8MADvO997+P+++8HYNOmTV3uM2zYMEaNGsW2bdu45pprePDBBxk1ahTz58/nxhtv5LLLLkMSkyZNYsWKFVx//fUsWLCAm2666R1jjRw5kqOPPnrHetmiX9NprohYk36uI3tU6LHAS2n6ivRzXeq+Btg/t/vk1NZdu5lZXZg6dSpLly7l4osv5he/+AVjx47dZf+HHnqIlStXcvzxx3PUUUexePFinn/+rRv+fvKTnwTgQx/6EKtXr65l6DvUrJhIGiVpTOcycCrwJLAEmJm6zeStR4guAS5Iz6qeBmxK02H3A6dKGi9pfBrn/lrFbWbW3w4++GAee+wxpk6dyqWXXsoVV1yxy/4RwSmnnMKKFStYsWIFK1euZNGiRTu277nnnkB2JNPR0fU0fF+r5TRXA3BX+nLMcOB7EXGfpOXAnZJmAc8D56b+PyG7LLiV7NLgzwFExAZJXweWp35XRMSGGsZtZkPcrq7ErMXtVNauXcuECRP47Gc/y7hx47qclsqbNm0ac+bMobW1lQMPPJAtW7awZs2aLs+N9JeaFZOI+B1wZBft64GTu2gPYE43Y90M3NzXMZqZDQZPPPEEX/7yl9ljjz0YMWIECxYs2GX/fffdl1tuuYUZM2awbds2AK688sr6LCZmZlbM9OnTmT59eo/98ifJTzrpJJYvX/6OPvlzJE1NTf12NZ1vp2JmZqX5yMTMbJD5xCc+wXPPPfe2tvnz5xc6ehkoLiZmZmRXSA2WOwffddddNR0/O0XdtzzNZWZD3siRI1m/fn1NfskONp0Pxxo5cmSfjusjEzMb8iZPnkxbWxsvv/xyj323bt3a57+I+1vnY3v7kouJmQ15I0aMKPwI25aWlrfdhsQynuYyM7PSXEzMzKw0FxMzMyvNxcTMzEpzMTEzs9JcTMzMrDQXEzMzK83FxMzMSvOXFnuhsZtnvZuZDXU+MjEzs9JcTMzMrDQXEzMzK83FxMzMSnMxMTOz0lxMzMysNBcTMzMrzcXEzMxKczExM7PSXEzMzKw0FxMzMyvNxcTMzEpzMTEzs9JcTMzMrDQXEzMzK83FxMzMSqt5MZE0TNLjku5J61Mk/UpSq6Q7JL0rte+Z1lvT9sbcGJek9mckTa91zGZm1jv9cWTyRWBVbn0+cG1EHAhsBGal9lnAxtR+beqHpMOA84DDgdOA6yUN64e4zcysoJoWE0mTgTOBm9K6gJOAH6Yui4Fz0vLZaZ20/eTU/2zg9ojYFhHPAa3AsbWM28zMeqfWz4D/FvAVYExanwi8GhEdab0NmJSWJwEvAEREh6RNqf8k4KHcmPl9dpA0G5gN0NDQQEtLS+Eg29vbC/W/aGpHj33yehNDLRTNq0rqMSdwXlVSjzlBllcZNSsmkj4GrIuIRyU11+p9OkXEQmAhQFNTUzQ3F3/LlpYWivS/cO69vYpp9fnFY6iFonlVST3mBM6rSuoxJyj/x28tj0yOBz4u6QxgJPBu4DpgnKTh6ehkMrAm9V8D7A+0SRoOjAXW59o75fcxM7NBoGbnTCLikoiYHBGNZCfQfxYR5wMPAp9O3WYCd6flJWmdtP1nERGp/bx0tdcU4CDg4VrFbWZmvVfrcyZduRi4XdKVwOPAotS+CLhNUiuwgawAERFPSboTWAl0AHMiYnv/h21mZt3pl2ISES1AS1r+HV1cjRURW4G/6mb/q4CrahehmZmV4W/Am5lZaS4mZmZWmouJmZmV5mJiZmaluZiYmVlpLiZmZlaai4mZmZXmYmJmZqW5mJiZWWkuJmZmVpqLiZmZleZiYmZmpbmYmJlZaS4mZmZW2kA8z2TIaOzmMb+r553Zz5GYmdWWj0zMzKw0FxMzMyvNxcTMzEpzMTEzs9JcTMzMrDQXEzMzK83FxMzMSnMxMTOz0lxMzMysNBcTMzMrzcXEzMxKK1RMJE2tdSBmZlZdRY9Mrpf0sKT/JWlsTSMyM7PKKVRMIuIvgPOB/YFHJX1P0ik1jczMzCqj8DmTiHgWuBS4GPgo8G1JT0v6ZK2CMzOzaih6zuQISdcCq4CTgLMi4s/T8rU1jM/MzCqg6MOx/i9wE/DViPhTZ2NErJV0aU0iMzOzyig6zXUm8L3OQiJpD0l7A0TEbV3tIGlkOmn/a0lPSfrH1D5F0q8ktUq6Q9K7Uvueab01bW/MjXVJan9G0vQS+ZqZWQ0ULSY/BfbKre+d2nZlG3BSRBwJHAWcJmkaMB+4NiIOBDYCs1L/WcDG1H5t6oekw4DzgMOB08iuLBtWMG4zM+sHRYvJyIho71xJy3vvaofIdO4zIr2C7DzLD1P7YuCctHx2WidtP1mSUvvtEbEtIp4DWoFjC8ZtZmb9oOg5ky2SjomIxwAkfQj4Uw/7kI4gHgUOBL4D/BZ4NSI6Upc2YFJangS8ABARHZI2ARNT+0O5YfP75N9rNjAboKGhgZaWloKpQXt7e6H+F03t6LFPEb2JrYyieVVJPeYEzqtK6jEnyPIqo2gx+RLwA0lrAQHvBT7T004RsR04StI44C7g0N0Ls2cRsRBYCNDU1BTNzc2F921paaFI/wvn3rub0b3d6vN7fq++UDSvKqnHnMB5VUk95gTl/8gtVEwiYrmkQ4FDUtMzEfFG0TeJiFclPQh8BBgnaXg6OpkMrEnd1pB9KbJN0nBgLLA+194pv4+ZmQ0CvbnR44eBI4BjgBmSLthVZ0n7piMSJO0FnEL2PZUHgU+nbjOBu9PykrRO2v6ziIjUfl662msKcBDwcC/iNjOzGit0ZCLpNuADwApge2oO4NZd7LYfsDidN9kDuDMi7pG0Erhd0pXA48Ci1H8RcJukVmAD2RVcRMRTku4EVgIdwJw0fWZmZoNE0XMmTcBh6UihkIj4DXB0F+2/o4ursSJiK/BX3Yx1FXBV0fc2M7P+VXSa60myk+5mZmbvUPTIZB9gpaSHyb6MCEBEfLwmUZmZWaUULSaX1zIIMzOrtqKXBv+XpAOAgyLip+m+XL6liZmZAcVvQf95sluc3JCaJgE/rlFMZmZWMUVPwM8Bjgdegx0PynpPrYIyM7NqKVpMtkXE650r6RvqhS8TNjOz+la0mPyXpK8Ce6Vnv/8A+I/ahWVmZlVStJjMBV4GngD+FvgJ2fPgzczMCl/N9SZwY3qZmZm9TdF7cz1HF+dIIuL9fR6RmZlVTm/uzdVpJNk9tCb0fThmZlZFhc6ZRMT63GtNRHwLOLO2oZmZWVUUneY6Jre6B9mRStGjGjMzq3NFC8I3c8sdwGrg3D6PxszMKqno1Vwn1jqQoaSxm2fJr57nmUMzq6ai01x/v6vtEXFN34RjZmZV1JuruT5M9jx2gLPInsP+bC2CMjOzailaTCYDx0TEZgBJlwP3RsRnaxWYmZlVR9HbqTQAr+fWX09tZmZmhY9MbgUelnRXWj8HWFyTiMzMrHKKXs11laT/BP4iNX0uIh6vXVhmZlYlRae5APYGXouI64A2SVNqFJOZmVVM0cf2fg24GLgkNY0AvluroMzMrFqKHpl8Avg4sAUgItYCY2oVlJmZVUvRYvJ6RATpNvSSRtUuJDMzq5qixeROSTcA4yR9HvgpflCWmZklPV7NJUnAHcChwGvAIcBlEbG0xrGZmVlF9FhMIiIk/SQipgIuIGZm9g5Fp7kek/ThmkZiZmaVVfQb8McBn5W0muyKLpEdtBxRq8DMzKw6dllMJL0vIn4PTO+neMzMrIJ6mub6MUBEPA9cExHP51+72lHS/pIelLRS0lOSvpjaJ0haKunZ9HN8apekb0tqlfSb/KOCJc1M/Z+VNLNUxmZm1ud6KibKLb+/l2N3ABdFxGHANGCOpMOAucCyiDgIWJbWAU4HDkqv2cACyIoP8DWyqbZjga91FiAzMxsceiom0c1yjyLixYh4LC1vBlYBk4CzeeuOw4vJ7kBMar81Mg+RfadlP7IptqURsSEiNpJdUXZab2IxM7PaUvbF9m42Stt564T7XsAfOzeRnYB/d6E3kRqBnwMfBH4fEeNSu4CNETFO0j3AvIj4Zdq2jOx+YM3AyIi4MrX/H+BPEXH1Tu8xm+yIhoaGhg/dfvvtRUIDoL29ndGjR/fY74k1mwqPuTumThrbp+MVzatK6jEncF5VUo85QZbXWWed9WhENO3O/rs8AR8Rw3YvrLdIGg38O/CliHgtqx87xg9JvTri6U5ELAQWAjQ1NUVzc3PhfVtaWijS/8K59+5mdMWsPr/nGHqjaF5VUo85gfOqknrMCbK8yujNLeh7TdIIskLybxHxo9T8Upq+Iv1cl9rXAPvndp+c2rprNzOzQaJmxSRNYS0CVkXENblNS4DOK7JmAnfn2i9IV3VNAzZFxIvA/cCpksanE++npjYzMxskin5pcXccD/w18ISkFantq8A8shtHzgKeB85N234CnAG0kp2b+RxARGyQ9HVgeep3RURsqGHcZmbWSzUrJulEurrZfHIX/QOY081YNwM39110ZmbWl2p6zsTMzIYGFxMzMyvNxcTMzEpzMTEzs9JcTMzMrLRaXhpsvdTYzTfsV887s58jMTPrHR+ZmJlZaS4mZmZWmouJmZmV5mJiZmaluZiYmVlpLiZmZlaai4mZmZXmYmJmZqW5mJiZWWkuJmZmVpqLiZmZleZiYmZmpbmYmJlZaS4mZmZWmouJmZmV5mJiZmaluZiYmVlpLiZmZlaai4mZmZXmYmJmZqW5mJiZWWkuJmZmVpqLiZmZleZiYmZmpbmYmJlZaS4mZmZWWs2KiaSbJa2T9GSubYKkpZKeTT/Hp3ZJ+rakVkm/kXRMbp+Zqf+zkmbWKl4zM9t9w2s49i3APwO35trmAssiYp6kuWn9YuB04KD0Og5YABwnaQLwNaAJCOBRSUsiYmMN46Zx7r21HN7MrO7U7MgkIn4ObNip+WxgcVpeDJyTa781Mg8B4yTtB0wHlkbEhlRAlgKn1SpmMzPbPbU8MulKQ0S8mJb/ADSk5UnAC7l+bamtu/Z3kDQbmA3Q0NBAS0tL4aDa29vf1v+iqR2F9+0Pvcklb+e86kE95gTOq0rqMSfI8iqjv4vJDhERkqIPx1sILARoamqK5ubmwvu2tLSQ73/hIJvmWn1+827tt3Ne9aAecwLnVSX1mBPs/h+tnfq7mLwkab+IeDFNY61L7WuA/XP9Jqe2NUDzTu0t/RDnoNLdOZzV887s50jMzLrW35cGLwE6r8iaCdyda78gXdU1DdiUpsPuB06VND5d+XVqajMzs0GkZkcmkr5PdlSxj6Q2squy5gF3SpoFPA+cm7r/BDgDaAX+CHwOICI2SPo6sDz1uyIidj6pb2ZmA6xmxSQiZnSz6eQu+gYwp5txbgZu7sPQzMysj/kb8GZmVpqLiZmZleZiYmZmpbmYmJlZaS4mZmZWmouJmZmV5mJiZmaluZiYmVlpLiZmZlaai4mZmZXmYmJmZqUN2PNMrDzfmt7MBgsfmZiZWWkuJmZmVpqLiZmZleZiYmZmpbmYmJlZaS4mZmZWmi8NrkOdlwxfNLWDC3OXD/uSYTOrFR+ZmJlZaS4mZmZWmouJmZmV5mJiZmaluZiYmVlpLiZmZlaai4mZmZXm75kMIb5lvZnVio9MzMysNBcTMzMrzcXEzMxK8zkT6/Zcyq74PIuZ5fnIxMzMSqtMMZF0mqRnJLVKmjvQ8ZiZ2VsqMc0laRjwHeAUoA1YLmlJRKwc2MiGrt2ZGuuKp8vM6kMliglwLNAaEb8DkHQ7cDbgYlJxvS1KOz+jpZOLktnAqkoxmQS8kFtvA47Ld5A0G5idVtslPdOL8fcBXikV4SD0v+swr+5y0vwBCKZv1d3/VVKPedVjTpDldcDu7lyVYtKjiFgILNydfSU9EhFNfRzSgKvHvOoxJ3BeVVKPOcGOvBp3d/+qnIBfA+yfW5+c2szMbBCoSjFZDhwkaYqkdwHnAUsGOCYzM0sqMc0VER2SvgDcDwwDbo6Ip/rwLXZreqwC6jGveswJnFeV1GNOUDIvRURfBWJmZkNUVaa5zMxsEHMxMTOz0oZ8ManybVok3SxpnaQnc20TJC2V9Gz6OT61S9K3U56/kXTMwEXePUn7S3pQ0kpJT0n6YmqvbF6SRkp6WNKvU07/mNqnSPpViv2OdHEJkvZM661pe+OAJtADScMkPS7pnrRe+bwkrZb0hKQVkh5JbZX9DAJIGifph5KelrRK0kf6MqchXUxyt2k5HTgMmCHpsIGNqlduAU7bqW0usCwiDgKWpXXIcjwovWYDC/opxt7qAC6KiMOAacCc9H9S5by2ASdFxJHAUcBpkqYB84FrI+JAYCMwK/WfBWxM7demfoPZF4FVufV6yevEiDgq952SKn8GAa4D7ouIQ4Ejyf7P+i6niBiyL+AjwP259UuASwY6rl7m0Ag8mVt/BtgvLe8HPJOWbwBmdNVvML+Au8nuyVYXeQF7A4+R3cHhFWB4at/xWSS7avEjaXl46qeBjr2bfCanX0InAfcAqpO8VgP77NRW2c8gMBZ4bud/777MaUgfmdD1bVomDVAsfaUhIl5My38AGtJy5XJN0yBHA7+i4nmlqaAVwDpgKfBb4NWI6Ehd8nHvyClt3wRM7NeAi/sW8BXgzbQ+kfrIK4AHJD2abtUE1f4MTgFeBv41TUneJGkUfZjTUC8mdS2yPykqee23pNHAvwNfiojX8tuqmFdEbI+Io8j+kj8WOHRgIypP0seAdRHx6EDHUgMnRMQxZNM9cyT9ZX5jBT+Dw4FjgAURcTSwhbemtIDyOQ31YlKPt2l5SdJ+AOnnutRemVwljSArJP8WET9KzZXPCyAiXgUeJJv+GSep84vD+bh35JS2jwXW92+khRwPfFzSauB2sqmu66h+XkTEmvRzHXAX2R8AVf4MtgFtEfGrtP5DsuLSZzkN9WJSj7dpWQLMTMszyc45dLZfkK7SmAZsyh3eDhqSBCwCVkXENblNlc1L0r6SxqXlvcjOAa0iKyqfTt12zqkz108DP0t/NQ4qEXFJREyO7OaA55HFeT4Vz0vSKEljOpeBU4EnqfBnMCL+ALwg6ZDUdDLZIzz6LqeBPjE00C/gDOD/kc1h/8NAx9PL2L8PvAi8QfaXxyyyOehlwLPAT4EJqa/Irlz7LfAE0DTQ8XeT0wlkh9q/AVak1xlVzgs4Ang85fQkcFlqfz/wMNAK/ADYM7WPTOutafv7BzqHAjk2A/fUQ14p/l+n11Odvxeq/BlMcR4FPJI+hz8GxvdlTr6dipmZlTbUp7nMzKwPuJiYmVlpLiZmZlaai4mZmZXmYmJmZqW5mJiZWWkuJmZmVtr/B6TLRlhWMromAAAAAElFTkSuQmCC\n"
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "print('句子长度分布图：')\n",
    "# df_len.s_len.hist(bins=50)\n",
    "df_len.plot.hist('s_len',  grid=True, title='sentence length ', bins=50)\n",
    "# plt.show()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "分位点为0.774的句子长度:60\n"
     ]
    },
    {
     "data": {
      "text/plain": "<Figure size 432x288 with 1 Axes>",
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAEXCAYAAACu1P9TAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAs7UlEQVR4nO3deXzU1b3/8dcnk40kyJKERQKETQQFQaPUYpVCRdygrdTl1lasXnqtttqqVVvrVm6Ltve6VWvt5q16VdCLoqWlLlB/tVUEISAgAmFJWAMJSxKyzvn9Md/EScgyJDMZJvN+PpxHZr7f8z3fzwnjJ2fO98z5mnMOERHp+hKiHYCIiHQOJXwRkTihhC8iEieU8EVE4oQSvohInFDCFxGJE0r4Ih4zc2Y2PArnnWRmRZ19Xok/SvjS6czsPjN7LtpxREs4/rCY2QVm9q6ZHTazYjP7u5lN9/bNMrM6MyvzHlvM7I9mdlLQ8bleHGVBj/yOtk2Ob0r4IjHGzGYC84E/ATlAX+Ae4NKgYv9yzmUAPYAvAUeAFWZ2apPqejrnMrzHaZGPXqJJCV9aZWZ3mNkOrye5wcymeNsTzOxOM9tsZvvNbJ6Z9fb21fcerzGz7Wa2z8x+7O2bBvwIuCK4V2lmPczs92a2yzvfHDPzeftmmdk/zOyXZlbq9VgvDIqxt9eD3entfzVo3yVmtsrMDpjZP81sbIjtTvHOt93M9pjZU2bWzds3ycyKzOxWM9vrxXxt0LGZZva6mR0ysw+9tvzD2/euVyzfa/8VQcc1W1+TuAz4b+CnzrnfOecOOuf8zrm/O+f+vWl551ydc26zc+47wN+B+0Jpv3RNSvjSIjMbCdwEnOmc6w5cAGz1dn8X+DJwHnAiUAo80aSKc4CRwBTgHjMb5Zz7K/Az4KUmvcpngFpgODAemApcH1TXBGADkAU8BPzeS34AzwJpwClAH+BhL/7xwB+AbwOZwG+AhWaWEkLz5wInAeO8mAYQ6EXX60eg9zwAuA54wsx6efueAMq9Mtd4DwCcc+d6T0/z2v9SCPUFGwkMBF4OoQ1N/R/whXYcJ12Fc04PPZp9EEh0ewkMCSQ12bcemBL0uj9QAyQCuYADcoL2LwOu9J7fBzwXtK8vUAV0C9p2FbDEez4L2BS0L82rv593Xj/Qq5n4f02gJxy8bQNwXgvtdV6bjUDCHha072xgi/d8EoEhksSg/XuBzwE+7/cwMmjfHOAfTc8T9LrF+pqJcaJ3fGor/26zgs8XtH0aUOM9r/83OhD0uC3a7zk9IvtIbOVvgcQ559wmM7uFQII+xcwWAz9wzu0EBgMLzMwfdEgdgeRdb3fQ8wogo4VTDQaSgF2fddpJAAqbq8s5V+GVywB6AyXOudIW6r3GzL4btC2ZwCeS1mQT+KOyIigeI5DM6+13ztUGva5vXzaBP3rBsQc/b0lL9R1VzvvZH9gSQr3BBgAlTbZlNTmvdGEa0pFWOef+1zl3DoHk6YAHvV2FwIXOuZ5Bj1Tn3I5Qqm3yupBADz8rqK4TnHOnhFBXIdDbzHq2sO8/m8SY5px7oY069xHocZ8SdFwPF7gI2pZiAkNTOUHbBoZwXKg2EGjXZe049ivA/wtjLBJjlPClRWY20swme2PelQSSYH2P/ingP81ssFc228xmhFj1HiDXzBIAnHO7gL8B/2VmJ3gXhIeZ2XltVeQd+xfgSTPrZWZJZlY/Tv5b4D/MbIIFpJvZxWbWvY06/d6xD5tZH699A8zsghDiqSMwVn6fmaWZ2cnAN5tp/9C26mqhfgf8APiJmV0b9Ps6x8yeblrezHxmNsTMHicwdHR/e84rXYMSvrQmhcDFy30EhlT6AHd5+x4FFgJ/M7PDwPsELqyGYr73c7+ZfeQ9/yaB4ZZ1BC4Av0xg2CIU3yAwbv4JgbHvWwCcc8uBfwd+5dW5icD4diju8Mq/b2aHgLcIXDANxU0ELsDuJnBB+QUCn2Dq3Qf8jzdz6PIQ62zgnHsZuAL4FrCTwB+QOcBrQcXONrMy4BCwFDiBwMX3Ncd6Puk6LNBhEJFIMbMHgX7OuWvaLCwSQerhi4SZmZ1sZmO9YaSzCEyzXBDtuEQ0S0ck/LoTGMY5kcBwy3/ReLhFJCo0pCMiEic0pCMiEieiNqSTlZXlcnNzo3V6EZGYtGLFin3Ouez2HBu1hJ+bm8vy5cujdXoRkZhkZtvae6yGdERE4oQSvohInFDCFxGJE0r4IiJxQglfRCROtJnwzewP3m3XPm5hv5nZY2a2ycxWm9np4Q9TREQ6KpQe/jME7pTTkguBEd5jNoG7DImIyHGmzXn4zrl3zSy3lSIzgD9563S/b2Y9zay/t065SLOcc/hd0E8czhF40HgfzWxrqXz9Nr9zuKDz0FDG29dMHYGfXj0EttNoe+DYxu0Iek6jF43LhXLMUfsa/75aqBpaqK/pqimh1N1arE3P3HJ9zcdzLDG0fEwr9bX862/599euWJvU3cLv/Oh/p8+2nDUkk5H9Wr0tQ0SE44tXA2h8C7cib9tRCd/MZhP4FMCgQYPCcGoJl6raOiqq6iirqqWiuv5nLeVVtZRX1VFe7f2sqg3aF9heVeOn1u+nps5R6/dTW+eo9Ttq6z7bVud31NQ5aur83kNrOEn8mvPlU2M24YfMOfc08DRAXl6e/o+PoDq/Y+eBI2wqLmPrvnKKD1dRWlFNSXk1pRU1HK6s5XBlDWVVgaQeagJOMEhPTiQ9JZG0FB8ZKYmkJCaQmJBAapKR5EsgMcH76TMSEwKvE32BbUm++n0J+MxIMDADMwv85LNtCd79ZK2+HJCQYBjNlzcL7Euo3xdUR0Iz5Wl4Xl8nDWW8/xrtS/CeY41/Jxa0wSx4e5NyFmq5xrUfc90t1tVyrE01PlfLx4RSrj3ta3pcS+07llhbPCYMdbfQpEbHBB+XkRKdRQ7CcdYdNL5nZ463TTpBWVUtBcVlbC4uo6C4vOFnwb5yqms/u794YoLRKz2Z3mnJ9ExLYkDPbpyQ2p30lEQyUhNJT/aRnpJ4VDIPvP5sX2pSQqP/QUQkdoQj4S8EbjKzFwnc4u6gxu8j43BlDcu3lfJ+wX7W7TzExj1l7D5U2bDfl2AM6p3GsOx0zj0pm2HZ6QzNzmBoVjq905OVqEXiXJsJ38xeIHDz4ywzKwLuBZIAnHNPAYuAiwjc/7MCuDZSwcajotIK3li9i7+t3U1+0UHq/I4knzGq/wl8flgmw/pkMCw7g+F90hnUO53kRH21QkSaF8osnava2O+AG8MWkeCcY+mGYn79980s21ICwNicHnxn0jDOHprJ+EG96Jbsi3KUIhJrdIvD44hzjiUb9vLoWxvJLzpITq9u3H7BSKafdiIDe6dFOzwRiXFK+MeB+kT/yFsbWe0l+gcvG8NXT88hyachGhEJDyX8KCuvquX2l/NZtGY3A3t346HLxvKV0wco0YtI2CnhR9GWfeV8+9nlbNpbxh3TTub6LwxRoheRiFHCj5IV20q49o8f4kswnr1uAhOHZ0U7JBHp4pTwo+D/bSxm9p9W0K9HKn/61lm6ICsinUIJv5MtXrub7/7vSoZmp/PsdRPI7p4S7ZBEJE4o4XeiN1bv5OYXVzFmQA/+59qz6JGWFO2QRCSOKOF3knkfFnLXgjWcPqgnf7z2rKgtniQi8UtTQjrBn1fv4oevrObzwzKV7EUkapR5IuzTPYe5df4q8gb34rffzCM1SUsiiEh0qIcfQUeq67jx+Y/ISEniyatPV7IXkahSDz+C/nPROjYVl/HstybQp3tqtMMRkTinHn6E/HPzPp57fzvfmjiEc0boS1UiEn1K+BFQU+fnJ69+zODMNG6bOjLa4YiIAEr4EfHc+9vYXFzOTy4erXXrReS4oYQfZiXl1Tz85qd8YUQWU0b1iXY4IiINlPDD7OE3P6W8uo6fXDJa95AVkeOKEn4YbdxzmOc/2MbVEwZxUt/u0Q5HRKQRJfwweuTtjXRL8nHzl06KdigiIkdRwg+TDbsPs2jNLmZNzKV3enK0wxEROYoSfpg89vZG0pMTuf6codEORUSkWUr4YfDpnsP8ec0uZn0+l17q3YvIcUoJPwxeWLadZF8C150zJNqhiIi0SAm/g2rq/Lyev5PJJ/dR715EjmtK+B309w3F7CurZuYZOdEORUSkVUr4HfTyiiIy05M5b2R2tEMREWmVEn4HlJZX8/Yne/jy+AEk+fSrFJHjm7JUB7yxeic1dY7LTtdwjogc/5TwO2Dx2j0MzU5n9IknRDsUEZE2KeG308EjNbxfsJ/zR/eNdigiIiEJKeGb2TQz22Bmm8zszmb2DzKzJWa20sxWm9lF4Q/1+LJ0w15q/Y6po/tFOxQRkZC0mfDNzAc8AVwIjAauMrPRTYrdDcxzzo0HrgSeDHegzfnrX//KyJEjGT58OHPnzj1q//e//33GjRvHuHHjOOmkk+jZsycAS5Ysadg+btw4UlNTefXVVxsd+73vfY+MjIwWz/23dXvIykhh/MCeYWyRiEjkhHIT87OATc65AgAzexGYAawLKuOA+oHsHsDOcAbZnLq6Om688UbefPNNcnJyOPPMM5k+fTqjR3/2t+jhhx9ueP7444+zcuVKAL74xS+yatUqAEpKShg+fDhTp05tKLt8+XJKS0tbPHdVbR1/31DMJWP7k5CgNe9FJDaEMqQzACgMel3kbQt2H3C1mRUBi4DvNleRmc02s+Vmtry4uLgd4X5m2bJlDB8+nKFDh5KcnMyVV17Ja6+91mL5F154gauuuuqo7S+//DIXXnghaWlpQOAPye23385DDz3UYl3vF5RQVlWr8XsRiSnhumh7FfCMcy4HuAh41syOqts597RzLs85l5ed3bEvKu3YsYOBAwc2vM7JyWHHjh3Nlt22bRtbtmxh8uTJR+178cUXG/0h+NWvfsX06dPp379/i+d+c91uuiX5mDg8qwMtEBHpXKEM6ewABga9zvG2BbsOmAbgnPuXmaUCWcDecATZUS+++CIzZ87E52t8Q/Fdu3axZs0aLrjgAgB27tzJ/PnzWbp0aYt1Oed4a91ezj0pi9Qk3aBcRGJHKD38D4ERZjbEzJIJXJRd2KTMdmAKgJmNAlKBjo3ZtGHAgAEUFn420lRUVMSAAU1HmgKa9uLrzZs3j6985SskJSUBsHLlSjZt2sTw4cPJzc2loqKC4cOHNzpmzY6D7D5UyfmanSMiMabNHr5zrtbMbgIWAz7gD865tWb2ALDcObcQuBX4rZl9n8AF3FnOORepoCetXIlLTGTZ2rVMeOMNUvr0oeDZZ1k8fz4VdXVctHp1Q9mKLVtYv2cPG3Jz+Tywr7qamWvXAvDR737HkO9+l0krV3LDgAFccfHFfLh1K99Yvx6AwokTyZk/n0krV3LrwIFcmpXFC6uKSDCYfHKfSDVPRCQiQhnSwTm3iMDF2OBt9wQ9XwdMDG9orbPERIbfcQdrbrwR5/dz/r/9G6eccgp33X03+7KyyDrvPAD2Ll5MnwsuwKzxbJrKnTup2rOHnmeccUznnb9mB0lZybqNoYjEHItgR7xVeXl5bvny5VE5d3vtOniEs3/+Dn3G9WDZledEOxwRiUNmtsI5l9eeY7W0wjFY8kngssRzXzwtypGIiBw7JfxjsGTDXgb07MaIPi1/A1dE5HilhB+iOr/jg4L9pPdLYc62bdEOR0TkmCnhh2j9rkMcqqylpAe83cqyCyIixysl/BB9sKUEgNQ+KVGORESkfUKalinwfsF+BmemYWn6dq2IxCb18EPg9zuWbSnhc0Myox2KiEi7qYcfgvW7D3HwSA2fG9abTUkHoh2OiEi7KOGH4P2CwPj9hCGZfKWnblguIrFJQzohqB+/P7Fnt2iHIiLSbkr4bXDOsWJbKWfm9gbgroIC7iooiHJUIiLHTkM6bSgqPUJJeTWnefeu/dfBg9ENSESkndTDb8PqokCCPy2nR5QjERHpGCX8NqwuOkCyL4GT+53QdmERkeOYEn4b8osOMKp/d5IT9asSkdimLNYKv9/x8Y5DjM3p2bAtJyWFnBQtryAisUcXbVtRsK+MsqpaxgaN3z83enQUIxIRaT/18FuRX+hdsPVm6IiIxDIl/Fas2XGQtGQfw7I/u+HJLRs3csvGjVGMSkSkfTSk04o1Ow5yyokn4Ev47Aboq8rKohiRiEj7qYffgto6P+t2HuLUAZp/LyJdgxJ+CzYXl3Okpq7RBVsRkVimhN+CNTsCF2zHqIcvIl2ExvBbsKboAOnJPoZkZTTaflJaWpQiEhHpGCX8FgQu2PZodMEW4OmRI6MUkYhIx2hIpxm1dX7W7dIFWxHpWpTwm7F1fwWVNX5Gn3j0gmmzN2xg9oYNUYhKRKRjNKTTjA27DwNwcr/uR+37tKKis8MREQkL9fCbsWH3IXwJxvA+GW0XFhGJESElfDObZmYbzGyTmd3ZQpnLzWydma01s/8Nb5ida/3uwwzJSic1yRftUEREwqbNIR0z8wFPAOcDRcCHZrbQObcuqMwI4C5gonOu1Mz6RCrgzrBh92HG6AtXItLFhNLDPwvY5JwrcM5VAy8CM5qU+XfgCedcKYBzbm94w+w8ZVW1bC+p4OS+R4/fA4zLyGBchoZ6RCT2hHLRdgBQGPS6CJjQpMxJAGb2HuAD7nPO/bVpRWY2G5gNMGjQoPbEG3Gf7vEu2PZv/paGj4wY0ZnhiIiETbgu2iYCI4BJwFXAb82sZ9NCzrmnnXN5zrm87OzsMJ06vD7Z1fIMHRGRWBZKwt8BDAx6neNtC1YELHTO1TjntgCfEvgDEHM27D5ERkoiA3p2a3b/1evWcfW6dc3uExE5noWS8D8ERpjZEDNLBq4EFjYp8yqB3j1mlkVgiKcgfGF2nk92H+akvhkkNFlSoV5RVRVFVVWdHJWISMe1mfCdc7XATcBiYD0wzzm31sweMLPpXrHFwH4zWwcsAW53zu2PVNCR4pzjk92HGdmv+fF7EZFYFtI3bZ1zi4BFTbbdE/TcAT/wHjFrz6EqDh6pYVR/jd+LSNejb9oGWb/7EAAjW5iSKSISy7SWTpDP1tBpeUjn7B76QpaIxCYl/CAbdh+mf49UeqQltVjm50OHdmJEIiLhoyGdIJuLy7Rgmoh0WUr4HuccBcXlDM1Kb7XcZR9/zGUff9xJUYmIhI+GdDzFh6soq6plaHbrPfz9NTWdFJGISHiph+/ZXFwOwNDs1nv4IiKxSgnfU7CvDKDNHr6ISKxSwvcUFJeTmpRA/xNSox2KiEhEaAzfU1BcxpCsltfQqTelV69OikhEJLyU8D0F+8o5dUDbX6r6SW5u5IMREYkADekAVbV1FJZUMKyNKZkiIrFMCR/Yvr8Cvwvtgu2Fq1dz4erVnRCViEh4aUiHY5uSeaSuLtLhiIhEhHr4fDYlc4iGdESkC1PCJzAls0/3FLqntrxomohIrFPCB7bsK1fvXkS6PI3hA9tLKpg8sk9IZS/JzIxwNCIikRH3Cf9IdR3Fh6sY2LtbSOVvGzQowhGJiERG3A/pFJVWADCwd1qUIxERiay4T/jbS44t4U9auZJJK1dGMiQRkYiI+4Rf6CX8Qerhi0gXF/cJf3vJEbol+chMT452KCIiEaWEX1LBoN5pmLW+SqaISKyL+4RfVFqhC7YiEhfielqmc47tJRWcPSz0ufWX9wltvr6IyPEmrhN+SXk1FdV1x3TB9jsDBkQwIhGRyInrIZ2GKZm9Qk/4FXV1VGjFTBGJQXHdw69P+IMyQ0/4F3lr4S8dPz4iMYmIREpc9/CLSo8AkNMrtGUVRERiWVwn/O37K8jKSCEtOa4/6IhInAgp4ZvZNDPbYGabzOzOVspdZmbOzPLCF2LkFJZWMCjERdNERGJdmwnfzHzAE8CFwGjgKjMb3Uy57sDNwAfhDjJStpdoDr6IxI9QevhnAZuccwXOuWrgRWBGM+V+CjwIVIYxvoipqfOz62DlMa+hM6tfP2b16xehqEREIieUhD8AKAx6XeRta2BmpwMDnXN/bq0iM5ttZsvNbHlxcfExBxtOuw5UUud3xzQlE2BW//7M6t8/QlGJiEROhy/amlkC8N/ArW2Vdc497ZzLc87lZWdnd/TUHXKsyyLX21ddzb7q6kiEJCISUaEk/B3AwKDXOd62et2BU4GlZrYV+Byw8Hi/cFvYcOOTY7toO3PtWmauXRuJkEREIiqUhP8hMMLMhphZMnAlsLB+p3PuoHMuyzmX65zLBd4Hpjvnlkck4jDZXlJBYoLRv4dm6YhIfGgz4TvnaoGbgMXAemCec26tmT1gZtMjHWCkFJZUkNOrG74ELYssIvEhpG8cOecWAYuabLunhbKTOh5W5BVqSqaIxJm4/aat5uCLSLyJyzUFDlfWUFpRc8xTMgFu0PLIIhKj4jLhF5YEFk1rz43Lr9ANUEQkRsXlkE57p2QCFFZWUlgZE18mFhFpJE57+N46+O3o4X9j/XpA6+GLSOyJzx5+SQXdUxPp0S0p2qGIiHSauEz420sqGNgrDTPNwReR+BG3Cb89wzkiIrEs7hK+3+8oKj3Srgu2IiKxLO4u2haXVVFV6293D//WgQPbLiQichyKu4RfP0Mnp50J/9KsrHCGIyLSaeJuSGd7B6ZkAmyoqGBDRUU4QxIR6RRx2MM/ghkM6Nm+Mfxvb9gAaB6+iMSeuOzh9+2eSmqSL9qhiIh0qrhL+IWakikicSr+En5pBTmakikicSiuEn5VbR27D1Wqhy8icSmuLtruKD2Cc7RrHfx6dw8eHMaIREQ6T1wl/IYpmZntT/hf6t07XOGIiHSquBrS6ciyyPVWHT7MqsOHwxWSiEiniasefmHpEZITE8jOSGl3Hbds2gRoHr6IxJ646uFv31/BwF7dSEjQssgiEn/iKuEXllYwUDN0RCROxVXC1zr4IhLP4ibhH6yo4XBlbYemZIqIxLK4uWhbPyWzo0M6Pxs6NBzhiIh0urhL+B0d0vl8jx7hCEdEpNPFzZBOYWl9D79j6+j88+BB/nnwYDhCEhHpVHHVw++VlkT31KQO1fOjggJA8/BFJPbETw+/RFMyRSS+hZTwzWyamW0ws01mdmcz+39gZuvMbLWZvW1mx90KY0r4IhLv2kz4ZuYDngAuBEYDV5nZ6CbFVgJ5zrmxwMvAQ+EOtCPq/I4dB45oSqaIxLVQevhnAZuccwXOuWrgRWBGcAHn3BLnXP2dvd8HcsIbZsfsPlRJTZ3Tl65EJK6FctF2AFAY9LoImNBK+euAvzS3w8xmA7MBBg0aFGKIHbd9f3imZAI8Mnx4h+sQEYmGsM7SMbOrgTzgvOb2O+eeBp4GyMvLc+E8d2vCNSUTYFz37h2uQ0QkGkJJ+DuAgUGvc7xtjZjZl4AfA+c556rCE154FJZUkGBwYs+OJ/y3SkoA3QhFRGJPKAn/Q2CEmQ0hkOivBP4tuICZjQd+A0xzzu0Ne5QdVFhSQf8e3UjydXwW6pxt2wAlfBGJPW1mQOdcLXATsBhYD8xzzq01swfMbLpX7BdABjDfzFaZ2cKIRdwOWiVTRCTEMXzn3CJgUZNt9wQ9/1KY4wqr7SVHmHxydrTDEBGJqi7/Tdsj1XXsK6tSD19E4l6XT/ifzdBRwheR+NblF08rDNM6+PV+M3JkWOoREelsXT7hh2sd/Hoj0/RJQURiU9cf0ik5QrckH5npyWGp7/V9+3h9376w1CUi0pniooc/qHcaZhaW+v6rMLDKxKVZWWGpT0Sks8RBD78iLEsqiIjEui6d8P1+x7aScnIz06MdiohI1HXphL/ncCWVNX4GZynhi4h06YS/dV9ghk5upmbWiIh06Yu22/aXA4R1SOfZUaPCVpeISGfq0gl/6/4KknwWlmWR6w1MTQ1bXSIinalLD+ls21/OwN5p+BLCMyUT4KW9e3lp73G3ArSISJu6fA8/3DN0fr0jcO+XK/r0CWu9IiKR1mV7+M45tu0vZ7Au2IqIAF044RcfrqKiuk5z8EVEPF024W/dH5iSqR6+iEhA1034+8I/JVNEJJZ12Yu2m4rLSE5MIKdXeNfRefmUU8Jan4hIZ+myCX/jnsMMzUon0RfeDzFZyeFZZllEpLN12YS/qbiMcQN7hb3eZ3btAmBW//5hr1u6tpqaGoqKiqisrIx2KBIDUlNTycnJISkpKWx1dsmEX1FdS1HpEb52xsCw1/3M7t2AEr4cu6KiIrp3705ubm7Y7s8gXZNzjv3791NUVMSQIUPCVm+XvGhbUFyOczC8T0a0QxFpUFlZSWZmppK9tMnMyMzMDPunwS6Z8DfuPQzACCV8Oc4o2UuoIvFe6ZIJf9PeMhITjMGakiki0qBLJvyNe8rIzUonObFLNk9EoujAgQM8+eST0Q6jXbpkRty0t4zh2ZEZzlk0diyLxo6NSN0iEjl1dXVhqSeWE36Xm6VTVVvHtpIKLhoTmVk0aT5fw/MDBw5w/fXX8/HHH2Nm/OEPf2DkyJFcccUVbN26ldzcXObNm0evXuGfHiqx7f7X17Ju56Gw1jn6xBO499K2vxj4pz/9iV/+8peYGWPHjsXn83HJJZcwc+ZMADIyMigrK2Pp0qXce++99OzZkzVr1nD55ZczZswYHn30UY4cOcKrr77KsGHDmD9/Pvfffz8+n48ePXrw7rvv8swzz7B8+XJ+9atfAXDJJZdw2223MWnSJDIyMrjhhhtYtGgR/fv352c/+xk//OEP2b59O4888gjTp09vNu5nnnmGBQsWcPDgQXbs2MHVV1/NvffeC8Bzzz3HY489RnV1NRMmTODJJ5/E5/ORkZHBt7/9bd566y2eeOIJCgoKGrX92Wefpbi4mP/4j/9g+/btADzyyCNMnDiR++67j+3bt1NQUMD27du55ZZb+N73vsedd97J5s2bGTduHOeffz733nsvM2bMoLS0lJqaGubMmcOMGTMA+OlPf8pzzz1HdnY2AwcO5IwzzuC2225j8+bN3HjjjRQXF5OWlsZvf/tbTj755A6/B9rS5Xr4W/dVUOd3jOgbmR7+kzt28KS3RPLNN9/MtGnT+OSTT8jPz2fUqFHMnTuXKVOmsHHjRqZMmcLcuXMjEodIe6xdu5Y5c+bwzjvvkJ+fz6OPPtpq+fz8fJ566inWr1/Ps88+y6effsqyZcu4/vrrefzxxwF44IEHWLx4Mfn5+SxcuLDNGMrLy5k8eTJr166le/fu3H333bz55pssWLCAe+65p9Vjly1bxiuvvMLq1auZP38+y5cvZ/369bz00ku89957rFq1Cp/Px/PPP99wrgkTJpCfn0+vXr2abfvNN9/M97//fT788ENeeeUVrr/++obzffLJJyxevJhly5Zx//33U1NTw9y5cxk2bBirVq3iF7/4BampqSxYsICPPvqIJUuWcOutt+Kca6gvPz+fv/zlLyxfvryh3tmzZ/P444+zYsUKfvnLX/Kd73ynzd9bOHS5Hv6mvWUADIvQkM487+YnX8/IaOjJACQnJ5OcnMxrr73G0qVLAbjmmmuYNGkSDz74YERikdgVSk88Et555x2+9rWvkZWVBUDv3r1bLX/mmWfS3/vOybBhw5g6dSoAY8aMYcmSJQBMnDiRWbNmcfnll/PVr361zRiSk5OZNm1aQz0pKSkkJSUxZswYtm7d2uqx559/PpmZmQB89atf5R//+AeJiYmsWLGCM888E4AjR47Qx7tfhc/n47LLLmu17W+99Rbr1q1rOMehQ4coKwvkkYsvvpiUlBRSUlLo06cPe/bsOSom5xw/+tGPePfdd0lISGDHjh3s2bOH9957jxkzZpCamkpqaiqXXnopAGVlZfzzn//ka1/7WkMdVVVVbf7ewqFLJnyzyCX8elu2bCE7O5trr72W/Px8zjjjDB599FH27NnT8D9Iv379mn2DiBxPEhMT8fv9APj9fqqrqxv2paSkNDxPSEhoeJ2QkEBtbS0ATz31FB988AF//vOfOeOMM1ixYkWjOoFG88mTkpIaphy2VGdLmk5VNDOcc1xzzTX8/Oc/P6p8amoqvqBh2Ob4/X7ef/99Upu5fWlw+30+X7PxPf/88xQXF7NixQqSkpLIzc1tdf683++nZ8+erFq1qtW4IiGkIR0zm2ZmG8xsk5nd2cz+FDN7ydv/gZnlhj3SEG0qLiOnVze6Jbf+j9xRtbW1fPTRR9xwww2sXLmS9PT0o4ZvzEzzruW4MnnyZObPn8/+/fsBKCkpITc3lxUrVgCwcOFCampqjqnOzZs3M2HCBB544AGys7MpLCwkNzeXVatW4ff7KSwsZNmyZWGJ/80336SkpKThGsLEiROZMmUKL7/8Mnu9T98lJSVs27btqGObazvA1KlTG4angDYTcffu3Tl8+HDD64MHD9KnTx+SkpJYsmRJw7knTpzI66+/TmVlJWVlZbzxxhsAnHDCCQwZMoT58+cDgU8I+fn57fyNHJs2e/hm5gOeAM4HioAPzWyhc25dULHrgFLn3HAzuxJ4ELgiEgG3JZIzdILl5OSQk5PDhAkTAJg5cyZz586lb9++7Nq1i/79+7Nr166Gj5Yix4NTTjmFH//4x5x33nn4fD7Gjx/Pgw8+yIwZMzjttNOYNm0a6enH9v2V22+/nY0bN+KcY8qUKZx22mkADBkyhNGjRzNq1ChOP/30sMR/1llncdlll1FUVMTVV19NXl4eAHPmzGHq1Kn4/X6SkpJ44oknGDx4cJttf+aZZ3jssce48cYbGTt2LLW1tZx77rk89dRTLcaQmZnJxIkTOfXUU7nwwgu54447uPTSSxkzZgx5eXkNF1/PPPNMpk+fztixY+nbty9jxoyhR48eQOBTwQ033MCcOXOoqanhyiuvbPi9RZI551ovYHY2cJ9z7gLv9V0AzrmfB5VZ7JX5l5klAruBbNdK5Xl5eS74IkZ7rCo8wM8XrafO76hzjjq/Y+3OQ3xrYi4/vnh0h+puyaSVKwFYOn48X/jCF/jd737HyJEjue+++ygvD6zBn5mZyZ133sncuXMpKSnhoYceikgsElvWr1/PqFGjoh1GzGo68ycWlJWVkZGRQUVFBeeeey5PP/30Mf3xa+49Y2YrnHN57YknlDH8AUBh0OsiYEJLZZxztWZ2EMgE9jUJdDYwG2DQoEHtibeR+sGS5MQEfAmGL8GYfHIfvjx+QIfrbsnS8eMbnj/++ON8/etfp7q6mqFDh/LHP/4Rv9/P5Zdfzu9//3sGDx7MvHnzIhaLiBzfZs+ezbp166isrOSaa64J2yed9gqlhz8TmOacu957/Q1ggnPupqAyH3tlirzXm70y+5qrE8LTwxeJJerhh2bx4sXccccdjbYNGTKEBQsWRCmi6IlGD38HELzOcI63rbkyRd6QTg9gf3sCEpH4dsEFF3DBBRdEO4wuKZRZOh8CI8xsiJklA1cCTb9dsRC4xns+E3intfF7kXil/y0kVJF4r7SZ8J1ztcBNwGJgPTDPObfWzB4ws/rvQP8eyDSzTcAPgKOmborEu9TUVPbv36+kL22qvwFKc98N6Ig2x/AjRWP4Em90i0M5Fi3d4jDSY/giEgZJSUlhvV2dyLHqcouniYhI85TwRUTihBK+iEiciNpFWzMrBo5e4Sg0WTT5Fm8XoDbFBrUpNnTlNg12zmW3p4KoJfyOMLPl7b1KfbxSm2KD2hQb1KbmaUhHRCROKOGLiMSJWE34T0c7gAhQm2KD2hQb1KZmxOQYvoiIHLtY7eGLiMgxUsIXEYkTMZfw27qh+vHKzP5gZnu9m8XUb+ttZm+a2UbvZy9vu5nZY14bV5tZdG+T0wIzG2hmS8xsnZmtNbObve0x2y4zSzWzZWaW77Xpfm/7EDP7wIv9JW+pcMwsxXu9ydufG9UGtMDMfGa20sze8F7HdHsAzGyrma0xs1VmttzbFrPvPQAz62lmL5vZJ2a23szODmebYirh22c3VL8QGA1cZWaRuXlt+D0DTGuy7U7gbefcCOBtPltW+kJghPeYDfy6k2I8VrXArc650cDngBu9f49YblcVMNk5dxowDphmZp8DHgQeds4NB0qB67zy1wGl3vaHvXLHo5sJLG9eL9bbU++LzrlxQfPTY/m9B/Ao8Ffn3MnAaQT+zcLXJudczDyAs4HFQa/vAu6KdlzHEH8u8HHQ6w1Af+95f2CD9/w3wFXNlTueH8BrwPldpV1AGvARgXs47wMSve0N70MC94k423ue6JWzaMfepB05XqKYDLxB4HbQMdueoHZtBbKabIvZ9x6BOwVuafr7DmebYqqHT/M3VI/cHcsjr69zbpf3fDfQ13sec+30PvqPBz4gxtvlDX+sAvYCbwKbgQMucDMgaBx3Q5u8/QeBzE4NuG2PAD8E/N7rTGK7PfUc8DczW2Fms71tsfzeGwIUA3/0ht9+Z2bphLFNsZbwuywX+BMdk3NkzSwDeAW4xTl3KHhfLLbLOVfnnBtHoGd8FnBydCNqPzO7BNjrnFsR7Vgi4Bzn3OkEhjZuNLNzg3fG4HsvETgd+LVzbjxQTpO7B3a0TbGW8EO5oXos2WNm/QG8n3u97THTTjNLIpDsn3fO/Z+3OebbBeCcOwAsITDk0dPM6m8YFBx3Q5u8/T2A/Z0baasmAtPNbCvwIoFhnUeJ3fY0cM7t8H7uBRYQ+OMcy++9IqDIOfeB9/plAn8AwtamWEv4odxQPZYE3/z9GgJj4PXbv+ldhf8ccDDoI91xw8yMwP2M1zvn/jtoV8y2y8yyzayn97wbgWsS6wkk/plesaZtqm/rTOAdrxd2XHDO3eWcy3HO5RL4/+Ud59zXidH21DOzdDPrXv8cmAp8TAy/95xzu4FCMxvpbZoCrCOcbYr2hYp2XNi4CPiUwLjqj6MdzzHE/QKwC6gh8Jf8OgJjo28DG4G3gN5eWSMwG2kzsAbIi3b8LbTpHAIfL1cDq7zHRbHcLmAssNJr08fAPd72ocAyYBMwH0jxtqd6rzd5+4dGuw2ttG0S8EZXaI8Xf773WFufC2L5vefFOQ5Y7r3/XgV6hbNNWlpBRCROxNqQjoiItJMSvohInFDCFxGJE0r4IiJxQglfRCROKOGLiMQJJXwRj5ktNbO8tkuKxCYlfBGROKGEL12e9zX8P1vgpiYfm9kVIRwz1cz+ZWYfmdl8b4G4+ptu3O9tX2NmMbuwmsQfJXyJB9OAnc6505xzpwJ/ba2wmWUBdwNfcoHVGJcDPwgqss/b/mvgtgjFLBJ2SvgSD9YA55vZg2b2BefcwTbKf47AHdXe89bFvwYYHLS/flXQFQRuaiMSExLbLiIS25xzn3r3+7wImGNmbzvnHmjlEAPedM5d1cL+Ku9nHfp/SGKIevjS5ZnZiUCFc+454BcE1hhvzfvARDMb7h2fbmYnRThMkYhT70TiwRjgF2bmJ7A89Q2tFXbOFZvZLOAFM0vxNt9NYFlukZil5ZFFROKEhnREROKEhnQk7pjZAmBIk813OOcWRyMekc6iIR0RkTihIR0RkTihhC8iEieU8EVE4oQSvohInPj/aQjxqoE2FzgAAAAASUVORK5CYII=\n"
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "from collections import Counter\n",
    "# df_len = df_len.sort_values(by='s_len', axis=0, ascending=True).reset_index(drop=True)\n",
    "c = Counter(s_lengths) # 句子长度： 该长度对应的句子数\n",
    "df_cumsum = pd.DataFrame(c.items(), columns=['s_len', 'cnt'])\n",
    "df_cumsum = df_cumsum.sort_values(by='s_len', axis=0, ascending=True).reset_index(drop=True)\n",
    "# df_cumsum.head()\n",
    "df_cumsum['cumsum'] = df_cumsum['cnt'].cumsum()\n",
    "# df_cumsum.head()\n",
    "df_cumsum['cumsum_percentage'] = df_cumsum['cumsum']/len(sentences)\n",
    "# df_cumsum.head()\n",
    "# df_cumsum.tail()\n",
    "ax = df_cumsum.plot('s_len', 'cumsum_percentage', title='sentence length CDF')\n",
    "# print(ax)\n",
    "# 寻找句子长度为100的分位点（或者直接从分位点如0.90去找句子长度也行）\n",
    "quantile = 0\n",
    "quantile_len = 60\n",
    "for i,row in df_cumsum.iterrows():\n",
    "    if row['s_len'] >= quantile_len:\n",
    "        quantile = round(row['cumsum_percentage'], 3)\n",
    "        break\n",
    "print(\"\\n分位点为%s的句子长度:%d\" % (quantile, quantile_len))\n",
    "ax.hlines(quantile, 0, quantile_len, colors=\"c\", linestyles=\"dashed\")\n",
    "ax.vlines(quantile_len, 0, quantile, colors=\"c\", linestyles=\"dashed\")\n",
    "ax.text(0, quantile, str(quantile))\n",
    "ax.text(quantile_len, 0, str(quantile_len))\n",
    "plt.show()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "所以可以设置最大句子长度为60，因为已经覆盖了3/4的句子"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vocab size: 4314\n"
     ]
    }
   ],
   "source": [
    "def build_vocab(sentences):\n",
    "    global word_to_id\n",
    "    for sentence in sentences:  # 建立word到索引的映射\n",
    "        for word in sentence:\n",
    "            if word not in word_to_id:\n",
    "                word_to_id[word] = len(word_to_id)\n",
    "    return word_to_id\n",
    "\n",
    "word_to_id = build_vocab(sentences)\n",
    "print('vocab size:', len(word_to_id))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [],
   "source": [
    "def convert_to_ids_and_padding(seqs, to_ids):\n",
    "    ids = []\n",
    "    for seq in seqs:\n",
    "        if len(seq)>=maxlen: # 截断\n",
    "            ids.append([to_ids[w] if w in to_ids else unk_id for w in seq[:maxlen]])\n",
    "        else: # padding\n",
    "            ids.append([to_ids[w] if w in to_ids else unk_id for w in seq] + [0]*(maxlen-len(seq)))\n",
    "\n",
    "    return torch.tensor(ids, dtype=torch.long)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [],
   "source": [
    "def load_data(filepath, word_to_id, shuffle=False):\n",
    "    sentences, tags = read_data(filepath)\n",
    "\n",
    "    inps = convert_to_ids_and_padding(sentences, word_to_id)\n",
    "    trgs = convert_to_ids_and_padding(tags, tag_to_id)\n",
    "\n",
    "    inp_dset = torch.utils.data.TensorDataset(inps, trgs)\n",
    "    inp_dloader = torch.utils.data.DataLoader(inp_dset,\n",
    "                                              batch_size=batch_size,\n",
    "                                              shuffle=shuffle,\n",
    "                                              num_workers=4)\n",
    "    return inp_dloader"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sample_batch: 2 torch.Size([512, 60]) torch.int64 torch.Size([512, 60]) torch.int64\n"
     ]
    }
   ],
   "source": [
    "# 查看data pipeline是否生效\n",
    "inp_dloader = load_data(data_base_dir + 'train.txt', word_to_id)\n",
    "sample_batch = next(iter(inp_dloader))\n",
    "print('sample_batch:', len(sample_batch), sample_batch[0].size(), sample_batch[0].dtype,\n",
    "      sample_batch[1].size(), sample_batch[1].dtype)  # [b,60] int64"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 3.搭建模型"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------------------------------------------------------------\n",
      "        Layer (type)               Output Shape         Param #\n",
      "================================================================\n",
      "         Embedding-1              [-1, 60, 100]         431,400\n",
      "              LSTM-2              [-1, 60, 128]          84,992\n",
      "            Linear-3                [-1, 60, 8]           1,032\n",
      "================================================================\n",
      "Total params: 517,424\n",
      "Trainable params: 517,424\n",
      "Non-trainable params: 0\n",
      "----------------------------------------------------------------\n",
      "Input size (MB): 0.000229\n",
      "Forward/backward pass size (MB): 0.108032\n",
      "Params size (MB): 1.973816\n",
      "Estimated Total Size (MB): 2.082077\n",
      "----------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "ngpu = 1\n",
    "device = 'cpu'\n",
    "\n",
    "class BiLSTM_CRF(torch.nn.Module):\n",
    "    def __init__(self, vocab_size, hidden_size):\n",
    "        super(BiLSTM_CRF, self).__init__()\n",
    "\n",
    "        self.hidden_size = hidden_size\n",
    "\n",
    "        self.embedding = torch.nn.Embedding(num_embeddings=vocab_size, embedding_dim=embedding_dim, padding_idx=pad_id)\n",
    "        self.bi_lstm = torch.nn.LSTM(input_size=embedding_dim, hidden_size=hidden_size // 2, batch_first=True,\n",
    "                                     bidirectional=True)  # , dropout=0.2)\n",
    "        self.hidden2tag = torch.nn.Linear(hidden_size, tags_num)\n",
    "\n",
    "        self.crf = CRF(num_tags=tags_num, batch_first=True)\n",
    "\n",
    "    def init_hidden(self, batch_size):\n",
    "        # device = 'cpu'\n",
    "        _batch_size = batch_size//ngpu\n",
    "        return (torch.randn(2, _batch_size, self.hidden_size // 2, device=device),\n",
    "                torch.randn(2, _batch_size, self.hidden_size // 2, device=device))  # ([b=1,2,hidden_size//2], [b=1,2,hidden_size//2])\n",
    "\n",
    "\n",
    "    def forward(self, inp):  # inp [b, seq_len=60]\n",
    "        self.bi_lstm.flatten_parameters()\n",
    "\n",
    "        embeds = self.embedding(inp)  # [b,seq_len]=>[b, seq_len, embedding_dim]\n",
    "        lstm_out, _ = self.bi_lstm(embeds, None)  # lstm_out: =>[b, seq_len, hidden_size], #####################################################\n",
    "        # lstm_out, self.hidden = self.bi_lstm(embeds, self.hidden)  # lstm_out: =>[b, seq_len, hidden_size], #####################################################\n",
    "        # h_n: ([b,2,hidden_size//2], c_n: [b,2,hidden_size//2])\n",
    "\n",
    "        logits = self.hidden2tag(lstm_out)  # [b, seq_len, hidden_size]=>[b, seq_len, tags_num]\n",
    "        return logits # [b, seq_len=60, tags_num=10]\n",
    "\n",
    "    # 计算CRF 条件对数似然，并返回其负值作为loss\n",
    "    def crf_neg_log_likelihood(self, inp, tags, mask=None, inp_logits=False):  # [b, seq_len, tags_num], [b, seq_len]\n",
    "        if inp_logits:\n",
    "            logits = inp\n",
    "        else:\n",
    "            logits = self.forward(inp)\n",
    "\n",
    "        if mask is None:\n",
    "            mask = torch.logical_not(torch.eq(tags, torch.tensor(0)))  # =>[b, seq_len],每个元素为bool值，如果序列中有pad，则mask相应位置就为False\n",
    "            mask = mask.type(torch.uint8)\n",
    "\n",
    "        crf_llh = self.crf(logits, tags, mask, reduction='mean') # Compute the conditional log likelihood of a sequence of tags given emission scores\n",
    "        # crf_llh = self.crf(logits, tags, mask) # Compute the conditional log likelihood of a sequence of tags given emission scores\n",
    "        return -crf_llh\n",
    "\n",
    "    def crf_decode(self, inp, mask=None, inp_logits=False):\n",
    "        if inp_logits:\n",
    "            logits = inp\n",
    "        else:\n",
    "            logits = self.forward(inp)\n",
    "\n",
    "        if mask is None and inp_logits is False:\n",
    "            mask = torch.logical_not(torch.eq(inp, torch.tensor(0)))  # =>[b, seq_len],每个元素为bool值，如果序列中有pad，则mask相应位置就为False\n",
    "            mask = mask.type(torch.uint8)\n",
    "\n",
    "        return self.crf.decode(emissions=logits, mask=mask)\n",
    "\n",
    "# 查看模型\n",
    "model = BiLSTM_CRF(len(word_to_id), hidden_size)\n",
    "torchkeras.summary(model, input_shape=(60,), input_dtype=torch.int64)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 4.模型训练&保存模型"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "******** device: cuda:0\n"
     ]
    }
   ],
   "source": [
    "ngpu = 4 # 4张GPU卡\n",
    "use_cuda = torch.cuda.is_available() # 检测是否有可用的gpu\n",
    "device = torch.device(\"cuda:0\" if (use_cuda and ngpu>0) else \"cpu\")\n",
    "print('*'*8, 'device:', device)\n",
    "\n",
    "\n",
    "# 设置评价指标\n",
    "metric_func = lambda y_pred, y_true: accuracy_score(y_true, y_pred)\n",
    "metric_name = 'acc'\n",
    "df_history = pd.DataFrame(columns=[\"epoch\", \"loss\", metric_name, \"val_loss\", \"val_\"+metric_name])"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "outputs": [],
   "source": [
    "def train_step(model, inps, tags, optimizer):\n",
    "    inps = inps.to(device)\n",
    "    tags = tags.to(device)\n",
    "    mask = torch.logical_not(torch.eq(inps, torch.tensor(0)))  # =>[b, seq_len]\n",
    "    # 每个元素为bool值，如果序列中有pad，则mask相应位置就为False\n",
    "    # mask = mask.type(torch.uint8)\n",
    "    # mask = mask.to(device) # 由device上的数据生成的数据，也是在device上\n",
    "\n",
    "    model.train()  # 设置train mode\n",
    "    optimizer.zero_grad()  # 梯度清零\n",
    "\n",
    "\n",
    "    # forward\n",
    "    logits = model(inps)\n",
    "    loss = model.module.crf_neg_log_likelihood(logits, tags, mask=mask, inp_logits=True)\n",
    "\n",
    "    # backward\n",
    "    loss.backward()  # 反向传播计算梯度\n",
    "    optimizer.step()  # 更新参数\n",
    "\n",
    "    preds = model.module.crf_decode(logits, mask=mask, inp_logits=True) # List[List]\n",
    "    pred_without_pad = []\n",
    "    for pred in preds:\n",
    "        pred_without_pad.extend(pred)\n",
    "    tags_without_pad = torch.masked_select(tags, mask).cpu().numpy() # 返回是1维张量\n",
    "    # print('tags_without_pad:', tags_without_pad.shape, type(tags_without_pad)) # [5082] tensor\n",
    "    metric = metric_func(pred_without_pad, tags_without_pad)\n",
    "    # print('*'*8, metric) # 标量\n",
    "\n",
    "    return loss.item(), metric"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "outputs": [],
   "source": [
    "@torch.no_grad()\n",
    "def validate_step(model, inps, tags):\n",
    "    inps = inps.to(device)\n",
    "    tags = tags.to(device)\n",
    "    mask = torch.logical_not(torch.eq(inps, torch.tensor(0)))  # =>[b, seq_len],每个元素为bool值，如果序列中有pad，则mask相应位置就为False\n",
    "    # mask = mask.type(torch.uint8)\n",
    "    # mask = mask.to(device)\n",
    "\n",
    "    model.eval()  # 设置eval mode\n",
    "\n",
    "    # forward\n",
    "    logits = model(inps)\n",
    "    loss = model.module.crf_neg_log_likelihood(logits, tags, mask=mask, inp_logits=True)\n",
    "\n",
    "    preds = model.module.crf_decode(logits, mask=mask, inp_logits=True)  # List[List]\n",
    "    pred_without_pad = []\n",
    "    for pred in preds:\n",
    "        pred_without_pad.extend(pred)\n",
    "    tags_without_pad = torch.masked_select(tags, mask).cpu().numpy()  # 返回是1维张量\n",
    "    metric = metric_func(pred_without_pad, tags_without_pad)\n",
    "    # print('*' * 8, metric) # 标量\n",
    "\n",
    "    return loss.item(), metric"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "outputs": [],
   "source": [
    "# 打印时间\n",
    "def printbar():\n",
    "    nowtime = datetime.datetime.now().strftime('%Y-%m_%d %H:%M:%S')\n",
    "    print('\\n' + \"==========\"*8 + '%s'%nowtime)\n",
    "\n",
    "\n",
    "def train_model(model, train_dloader, val_dloader, optimizer, num_epochs=10, print_every=150):\n",
    "    starttime = time.time()\n",
    "    print('*' * 27, 'start training...')\n",
    "    printbar()\n",
    "\n",
    "    best_metric = 0.\n",
    "    for epoch in range(1, num_epochs+1):\n",
    "        # 训练\n",
    "        loss_sum, metric_sum = 0., 0.\n",
    "        for step, (inps, tags) in enumerate(train_dloader, start=1):\n",
    "            loss, metric = train_step(model, inps, tags, optimizer)\n",
    "            loss_sum += loss\n",
    "            metric_sum += metric\n",
    "\n",
    "            # 打印batch级别日志\n",
    "            if step % print_every == 0:\n",
    "                print('*'*27, f'[step = {step}] loss: {loss_sum/step:.3f}, {metric_name}: {metric_sum/step:.3f}')\n",
    "\n",
    "        # 验证 一个epoch的train结束，做一次验证\n",
    "        val_loss_sum, val_metric_sum = 0., 0.\n",
    "        for val_step, (inps, tags) in enumerate(val_dloader, start=1):\n",
    "            val_loss, val_metric = validate_step(model, inps, tags)\n",
    "            val_loss_sum += val_loss\n",
    "            val_metric_sum += val_metric\n",
    "\n",
    "\n",
    "        # 记录和收集 1个epoch的训练和验证信息\n",
    "        # columns=['epoch', 'loss', metric_name, 'val_loss', 'val_'+metric_name]\n",
    "        record = (epoch, loss_sum/step, metric_sum/step, val_loss_sum/val_step, val_metric_sum/val_step)\n",
    "        df_history.loc[epoch - 1] = record\n",
    "\n",
    "        # 打印epoch级别日志\n",
    "        print('EPOCH = {} loss: {:.3f}, {}: {:.3f}, val_loss: {:.3f}, val_{}: {:.3f}'.format(\n",
    "               record[0], record[1], metric_name, record[2], record[3], metric_name, record[4]))\n",
    "        printbar()\n",
    "\n",
    "        # 保存最佳模型参数\n",
    "        current_metric_avg = val_metric_sum/val_step\n",
    "        if current_metric_avg > best_metric:\n",
    "            best_metric = current_metric_avg\n",
    "            checkpoint = save_dir + f'epoch{epoch:03d}_valacc{current_metric_avg:.3f}_ckpt.tar'\n",
    "            if device.type == 'cuda' and ngpu > 1:\n",
    "                model_sd = copy.deepcopy(model.module.state_dict())\n",
    "            else:\n",
    "                model_sd = copy.deepcopy(model.state_dict())\n",
    "            # 保存\n",
    "            torch.save({\n",
    "                'loss': loss_sum / step,\n",
    "                'epoch': epoch,\n",
    "                'net': model_sd,\n",
    "                'opt': optimizer.state_dict(),\n",
    "            }, checkpoint)\n",
    "\n",
    "\n",
    "    endtime = time.time()\n",
    "    time_elapsed = endtime - starttime\n",
    "    print('*' * 27, 'training finished...')\n",
    "    print('*' * 27, 'and it costs {} h {} min {:.2f} s'.format(int(time_elapsed // 3600),\n",
    "                                                               int((time_elapsed % 3600) // 60),\n",
    "                                                               (time_elapsed % 3600) % 60))\n",
    "\n",
    "    print('Best val Acc: {:4f}'.format(best_metric))\n",
    "    return df_history"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 开始训练"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "*************************** start training...\n",
      "\n",
      "================================================================================2021-02_10 15:40:52\n",
      "EPOCH = 1 loss: 45.688, acc: 0.801, val_loss: 20.320, val_acc: 0.886\n",
      "\n",
      "================================================================================2021-02_10 15:41:16\n",
      "EPOCH = 2 loss: 19.270, acc: 0.883, val_loss: 17.360, val_acc: 0.886\n",
      "\n",
      "================================================================================2021-02_10 15:41:43\n",
      "EPOCH = 3 loss: 16.447, acc: 0.884, val_loss: 14.993, val_acc: 0.890\n",
      "\n",
      "================================================================================2021-02_10 15:42:07\n",
      "EPOCH = 4 loss: 14.100, acc: 0.893, val_loss: 12.828, val_acc: 0.902\n",
      "\n",
      "================================================================================2021-02_10 15:42:36\n",
      "EPOCH = 5 loss: 11.959, acc: 0.907, val_loss: 10.922, val_acc: 0.915\n",
      "\n",
      "================================================================================2021-02_10 15:43:04\n",
      "EPOCH = 6 loss: 10.118, acc: 0.919, val_loss: 9.352, val_acc: 0.926\n",
      "\n",
      "================================================================================2021-02_10 15:43:30\n",
      "EPOCH = 7 loss: 8.649, acc: 0.931, val_loss: 8.150, val_acc: 0.935\n",
      "\n",
      "================================================================================2021-02_10 15:43:55\n",
      "EPOCH = 8 loss: 7.531, acc: 0.939, val_loss: 7.258, val_acc: 0.941\n",
      "\n",
      "================================================================================2021-02_10 15:44:22\n",
      "EPOCH = 9 loss: 6.677, acc: 0.946, val_loss: 6.583, val_acc: 0.946\n",
      "\n",
      "================================================================================2021-02_10 15:44:49\n",
      "EPOCH = 10 loss: 6.008, acc: 0.951, val_loss: 6.056, val_acc: 0.951\n",
      "\n",
      "================================================================================2021-02_10 15:45:18\n",
      "EPOCH = 11 loss: 5.464, acc: 0.955, val_loss: 5.633, val_acc: 0.953\n",
      "\n",
      "================================================================================2021-02_10 15:45:47\n",
      "EPOCH = 12 loss: 5.010, acc: 0.958, val_loss: 5.284, val_acc: 0.956\n",
      "\n",
      "================================================================================2021-02_10 15:46:15\n",
      "EPOCH = 13 loss: 4.623, acc: 0.961, val_loss: 4.990, val_acc: 0.958\n",
      "\n",
      "================================================================================2021-02_10 15:46:41\n",
      "EPOCH = 14 loss: 4.285, acc: 0.963, val_loss: 4.740, val_acc: 0.959\n",
      "\n",
      "================================================================================2021-02_10 15:47:07\n",
      "EPOCH = 15 loss: 3.990, acc: 0.966, val_loss: 4.525, val_acc: 0.960\n",
      "\n",
      "================================================================================2021-02_10 15:47:32\n",
      "EPOCH = 16 loss: 3.728, acc: 0.968, val_loss: 4.339, val_acc: 0.962\n",
      "\n",
      "================================================================================2021-02_10 15:47:55\n",
      "EPOCH = 17 loss: 3.493, acc: 0.970, val_loss: 4.176, val_acc: 0.963\n",
      "\n",
      "================================================================================2021-02_10 15:48:20\n",
      "EPOCH = 18 loss: 3.281, acc: 0.972, val_loss: 4.032, val_acc: 0.964\n",
      "\n",
      "================================================================================2021-02_10 15:48:44\n",
      "EPOCH = 19 loss: 3.087, acc: 0.973, val_loss: 3.905, val_acc: 0.965\n",
      "\n",
      "================================================================================2021-02_10 15:49:10\n",
      "EPOCH = 20 loss: 2.909, acc: 0.975, val_loss: 3.795, val_acc: 0.966\n",
      "\n",
      "================================================================================2021-02_10 15:49:37\n",
      "EPOCH = 21 loss: 2.744, acc: 0.976, val_loss: 3.691, val_acc: 0.967\n",
      "\n",
      "================================================================================2021-02_10 15:50:01\n",
      "EPOCH = 22 loss: 2.592, acc: 0.978, val_loss: 3.606, val_acc: 0.967\n",
      "\n",
      "================================================================================2021-02_10 15:50:27\n",
      "EPOCH = 23 loss: 2.448, acc: 0.979, val_loss: 3.524, val_acc: 0.968\n",
      "\n",
      "================================================================================2021-02_10 15:50:54\n",
      "EPOCH = 24 loss: 2.317, acc: 0.980, val_loss: 3.456, val_acc: 0.968\n",
      "\n",
      "================================================================================2021-02_10 15:51:19\n",
      "EPOCH = 25 loss: 2.192, acc: 0.981, val_loss: 3.381, val_acc: 0.969\n",
      "\n",
      "================================================================================2021-02_10 15:51:47\n",
      "EPOCH = 26 loss: 2.076, acc: 0.982, val_loss: 3.329, val_acc: 0.969\n",
      "\n",
      "================================================================================2021-02_10 15:52:13\n",
      "EPOCH = 27 loss: 1.965, acc: 0.983, val_loss: 3.278, val_acc: 0.970\n",
      "\n",
      "================================================================================2021-02_10 15:52:40\n",
      "EPOCH = 28 loss: 1.862, acc: 0.984, val_loss: 3.236, val_acc: 0.970\n",
      "\n",
      "================================================================================2021-02_10 15:53:05\n",
      "EPOCH = 29 loss: 1.764, acc: 0.985, val_loss: 3.195, val_acc: 0.971\n",
      "\n",
      "================================================================================2021-02_10 15:53:32\n",
      "EPOCH = 30 loss: 1.671, acc: 0.986, val_loss: 3.159, val_acc: 0.971\n",
      "\n",
      "================================================================================2021-02_10 15:53:59\n",
      "*************************** training finished...\n",
      "*************************** and it costs 0 h 13 min 6.35 s\n",
      "Best val Acc: 0.970863\n"
     ]
    },
    {
     "data": {
      "text/plain": "    epoch       loss       acc   val_loss   val_acc\n0     1.0  45.687649  0.801478  20.319972  0.885931\n1     2.0  19.270487  0.882515  17.359610  0.886046\n2     3.0  16.446749  0.884191  14.992622  0.890471\n3     4.0  14.099821  0.893419  12.827554  0.902364\n4     5.0  11.959304  0.906506  10.921850  0.915174\n5     6.0  10.118494  0.919399   9.352380  0.926360\n6     7.0   8.649001  0.930854   8.150103  0.934836\n7     8.0   7.530997  0.939349   7.258168  0.941337\n8     9.0   6.677365  0.945722   6.582935  0.946342\n9    10.0   6.007839  0.950506   6.055818  0.950501\n10   11.0   5.464437  0.954606   5.632995  0.953218\n11   12.0   5.010489  0.957950   5.284153  0.955965\n12   13.0   4.622941  0.960834   4.990327  0.957904\n13   14.0   4.285432  0.963429   4.739633  0.959177\n14   15.0   3.990457  0.965596   4.524728  0.960348\n15   16.0   3.728434  0.967803   4.338606  0.961539\n16   17.0   3.493265  0.969835   4.175720  0.962785\n17   18.0   3.281112  0.971526   4.032297  0.963835\n18   19.0   3.087202  0.973075   3.904875  0.964902\n19   20.0   2.908959  0.974604   3.794503  0.965694\n20   21.0   2.743989  0.976074   3.690805  0.966660\n21   22.0   2.591643  0.977504   3.606392  0.967307\n22   23.0   2.448340  0.978775   3.523581  0.967721\n23   24.0   2.317027  0.979927   3.455711  0.968222\n24   25.0   2.191622  0.981041   3.381256  0.968691\n25   26.0   2.075960  0.982126   3.329331  0.969128\n26   27.0   1.964940  0.983188   3.278260  0.969544\n27   28.0   1.861808  0.984200   3.236310  0.969904\n28   29.0   1.763881  0.985222   3.194886  0.970553\n29   30.0   1.671001  0.986171   3.159388  0.970863",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>epoch</th>\n      <th>loss</th>\n      <th>acc</th>\n      <th>val_loss</th>\n      <th>val_acc</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1.0</td>\n      <td>45.687649</td>\n      <td>0.801478</td>\n      <td>20.319972</td>\n      <td>0.885931</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>2.0</td>\n      <td>19.270487</td>\n      <td>0.882515</td>\n      <td>17.359610</td>\n      <td>0.886046</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3.0</td>\n      <td>16.446749</td>\n      <td>0.884191</td>\n      <td>14.992622</td>\n      <td>0.890471</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>4.0</td>\n      <td>14.099821</td>\n      <td>0.893419</td>\n      <td>12.827554</td>\n      <td>0.902364</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>5.0</td>\n      <td>11.959304</td>\n      <td>0.906506</td>\n      <td>10.921850</td>\n      <td>0.915174</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>6.0</td>\n      <td>10.118494</td>\n      <td>0.919399</td>\n      <td>9.352380</td>\n      <td>0.926360</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>7.0</td>\n      <td>8.649001</td>\n      <td>0.930854</td>\n      <td>8.150103</td>\n      <td>0.934836</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>8.0</td>\n      <td>7.530997</td>\n      <td>0.939349</td>\n      <td>7.258168</td>\n      <td>0.941337</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>9.0</td>\n      <td>6.677365</td>\n      <td>0.945722</td>\n      <td>6.582935</td>\n      <td>0.946342</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>10.0</td>\n      <td>6.007839</td>\n      <td>0.950506</td>\n      <td>6.055818</td>\n      <td>0.950501</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>11.0</td>\n      <td>5.464437</td>\n      <td>0.954606</td>\n      <td>5.632995</td>\n      <td>0.953218</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>12.0</td>\n      <td>5.010489</td>\n      <td>0.957950</td>\n      <td>5.284153</td>\n      <td>0.955965</td>\n    </tr>\n    <tr>\n      <th>12</th>\n      <td>13.0</td>\n      <td>4.622941</td>\n      <td>0.960834</td>\n      <td>4.990327</td>\n      <td>0.957904</td>\n    </tr>\n    <tr>\n      <th>13</th>\n      <td>14.0</td>\n      <td>4.285432</td>\n      <td>0.963429</td>\n      <td>4.739633</td>\n      <td>0.959177</td>\n    </tr>\n    <tr>\n      <th>14</th>\n      <td>15.0</td>\n      <td>3.990457</td>\n      <td>0.965596</td>\n      <td>4.524728</td>\n      <td>0.960348</td>\n    </tr>\n    <tr>\n      <th>15</th>\n      <td>16.0</td>\n      <td>3.728434</td>\n      <td>0.967803</td>\n      <td>4.338606</td>\n      <td>0.961539</td>\n    </tr>\n    <tr>\n      <th>16</th>\n      <td>17.0</td>\n      <td>3.493265</td>\n      <td>0.969835</td>\n      <td>4.175720</td>\n      <td>0.962785</td>\n    </tr>\n    <tr>\n      <th>17</th>\n      <td>18.0</td>\n      <td>3.281112</td>\n      <td>0.971526</td>\n      <td>4.032297</td>\n      <td>0.963835</td>\n    </tr>\n    <tr>\n      <th>18</th>\n      <td>19.0</td>\n      <td>3.087202</td>\n      <td>0.973075</td>\n      <td>3.904875</td>\n      <td>0.964902</td>\n    </tr>\n    <tr>\n      <th>19</th>\n      <td>20.0</td>\n      <td>2.908959</td>\n      <td>0.974604</td>\n      <td>3.794503</td>\n      <td>0.965694</td>\n    </tr>\n    <tr>\n      <th>20</th>\n      <td>21.0</td>\n      <td>2.743989</td>\n      <td>0.976074</td>\n      <td>3.690805</td>\n      <td>0.966660</td>\n    </tr>\n    <tr>\n      <th>21</th>\n      <td>22.0</td>\n      <td>2.591643</td>\n      <td>0.977504</td>\n      <td>3.606392</td>\n      <td>0.967307</td>\n    </tr>\n    <tr>\n      <th>22</th>\n      <td>23.0</td>\n      <td>2.448340</td>\n      <td>0.978775</td>\n      <td>3.523581</td>\n      <td>0.967721</td>\n    </tr>\n    <tr>\n      <th>23</th>\n      <td>24.0</td>\n      <td>2.317027</td>\n      <td>0.979927</td>\n      <td>3.455711</td>\n      <td>0.968222</td>\n    </tr>\n    <tr>\n      <th>24</th>\n      <td>25.0</td>\n      <td>2.191622</td>\n      <td>0.981041</td>\n      <td>3.381256</td>\n      <td>0.968691</td>\n    </tr>\n    <tr>\n      <th>25</th>\n      <td>26.0</td>\n      <td>2.075960</td>\n      <td>0.982126</td>\n      <td>3.329331</td>\n      <td>0.969128</td>\n    </tr>\n    <tr>\n      <th>26</th>\n      <td>27.0</td>\n      <td>1.964940</td>\n      <td>0.983188</td>\n      <td>3.278260</td>\n      <td>0.969544</td>\n    </tr>\n    <tr>\n      <th>27</th>\n      <td>28.0</td>\n      <td>1.861808</td>\n      <td>0.984200</td>\n      <td>3.236310</td>\n      <td>0.969904</td>\n    </tr>\n    <tr>\n      <th>28</th>\n      <td>29.0</td>\n      <td>1.763881</td>\n      <td>0.985222</td>\n      <td>3.194886</td>\n      <td>0.970553</td>\n    </tr>\n    <tr>\n      <th>29</th>\n      <td>30.0</td>\n      <td>1.671001</td>\n      <td>0.986171</td>\n      <td>3.159388</td>\n      <td>0.970863</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_dloader = load_data(data_base_dir + 'train.txt', word_to_id)\n",
    "val_dloader = load_data(data_base_dir + 'val.txt', word_to_id)\n",
    "\n",
    "model = BiLSTM_CRF(len(word_to_id), hidden_size)\n",
    "model = model.to(device)\n",
    "if ngpu > 1:\n",
    "    model = torch.nn.DataParallel(model, device_ids=list(range(ngpu)))  # 设置并行执行  device_ids=[0,1,2,3]\n",
    "\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=LR)\n",
    "train_model(model, train_dloader, val_dloader, optimizer, num_epochs=EPOCHS, print_every=50)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "outputs": [
    {
     "data": {
      "text/plain": "<Figure size 432x288 with 1 Axes>",
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEWCAYAAABhffzLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAuuklEQVR4nO3deXxU1f3/8dcnCVsElDUigQS/7opCBcEqrUvdKIp146tR0dJSlyr2a6kL+nP5grWb1bYuxdYN4r61VVwRvlbrBrihILiExY2AgCCbST6/P84NhDATJiSTyeS+n4/HPGbmzJ17PzcDn3PvOeeea+6OiIjER06mAxARkaalxC8iEjNK/CIiMaPELyISM0r8IiIxo8QvIhIzSvzSYGb2lJmNbOxlM8nMyszsB2lYr5vZLtHr28zsylSW3YbtlJjZs9saZx3rPcTMFjf2eqVp5WU6AMkMM1td420+sB6ojN7/zN1LU12Xux+TjmVbOnc/pzHWY2bFwCdAK3eviNZdCqT8G0q8KPHHlLu3r35tZmXAT9z9+drLmVledTIRkZZBTT2ymepTeTO7xMy+AO40s05m9oSZlZvZ8uh1YY3vTDezn0SvzzKzl8zs99Gyn5jZMdu4bB8ze9HMVpnZ82Z2s5lNThJ3KjH+r5m9HK3vWTPrWuPzM8xsgZktM7Nxdfx9BpnZF2aWW6PsR2b2TvT6ADN7xcxWmNnnZvYXM2udZF13mdn4Gu/HRt/5zMx+XGvZH5rZm2b2tZktMrOra3z8YvS8wsxWm9mB1X/bGt//rpm9YWYro+fvpvq3qYuZ7Rl9f4WZvWdmx9X4bKiZvR+t81Mz+2VU3jX6fVaY2Vdm9m8zUy5qQvpjSyI7Ap2BImA04d/JndH73sBa4C91fH8Q8AHQFfgt8Hczs21Y9l7gdaALcDVwRh3bTCXG04Czge5Aa6A6Ee0F3Bqtf6doe4Uk4O6vAd8Ah9Va773R60rgF9H+HAgcDpxXR9xEMRwdxXMEsCtQu3/hG+BMYAfgh8C5ZnZ89Nn3oucd3L29u79Sa92dgSeBP0X7dgPwpJl1qbUPW/xtthJzK+BfwLPR9y4ASs1s92iRvxOaDTsA+wAvROUXA4uBbkABcDmguWOakBK/JFIFXOXu6919rbsvc/dH3H2Nu68CJgDfr+P7C9z9dnevBO4GehD+g6e8rJn1BgYC/8/dN7j7S8A/k20wxRjvdPd57r4WeBDoF5WfBDzh7i+6+3rgyuhvkMx9wKkAZtYBGBqV4e4z3f1Vd69w9zLgrwniSOSUKL7Z7v4NoaKruX/T3f1dd69y93ei7aWyXggVxXx3nxTFdR8wFzi2xjLJ/jZ1GQy0B66PfqMXgCeI/jbAt8BeZtbR3Ze7+6wa5T2AInf/1t3/7Zo0rEkp8Usi5e6+rvqNmeWb2V+jppCvCU0LO9Rs7qjli+oX7r4metm+nsvuBHxVowxgUbKAU4zxixqv19SIaaea644S77Jk2yIc3Z9gZm2AE4BZ7r4gimO3qBnjiyiO6whH/1uzWQzAglr7N8jMpkVNWSuBc1Jcb/W6F9QqWwD0rPE+2d9mqzG7e81KsuZ6TyRUigvM7P/M7MCo/HfAh8CzZvaxmV2a2m5IY1Hil0RqH31dDOwODHL3jmxqWkjWfNMYPgc6m1l+jbJedSzfkBg/r7nuaJtdki3s7u8TEtwxbN7MA6HJaC6waxTH5dsSA6G5qqZ7CWc8vdx9e+C2Guvd2tHyZ4QmsJp6A5+mENfW1turVvv8xvW6+xvuPpzQDPQ44UwCd1/l7he7+87AccD/mNnhDYxF6kGJX1LRgdBmviJqL74q3RuMjqBnAFebWevoaPHYOr7SkBgfBoaZ2cFRR+y1bP3/xr3AGEIF81CtOL4GVpvZHsC5KcbwIHCWme0VVTy14+9AOANaZ2YHECqcauWEpqmdk6x7CrCbmZ1mZnlmNgLYi9As0xCvEc4OfmVmrczsEMJvdH/0m5WY2fbu/i3hb1IFYGbDzGyXqC9nJaFfpK6mNWlkSvySihuBdsBS4FXg6Sbabgmhg3QZMB54gHC9QSI3so0xuvt7wPmEZP45sJzQ+ViX6jb2F9x9aY3yXxKS8irg9ijmVGJ4KtqHFwjNIC/UWuQ84FozWwX8P6Kj5+i7awh9Gi9HI2UG11r3MmAY4axoGfArYFituOvN3TcQEv0xhL/7LcCZ7j43WuQMoCxq8jqH8HtC6Lx+HlgNvALc4u7TGhKL1I+pT0WyhZk9AMx197SfcYi0ZDril2bLzAaa2X+ZWU403HE4oa1YRBpAV+5Kc7Yj8Ciho3UxcK67v5nZkESyn5p6RERiRk09IiIxkxVNPV27dvXi4uJMhyEiklVmzpy51N271S7PisRfXFzMjBkzMh2GiEhWMbPaV2wDauoREYkdJX4RkZhR4hcRiZmsaOMXkZbn22+/ZfHixaxbt27rC0ud2rZtS2FhIa1atUppeSV+EcmIxYsX06FDB4qLi0l+nx7ZGndn2bJlLF68mD59+qT0nRbb1FNaCsXFkJMTnkt122mRZmXdunV06dJFSb+BzIwuXbrU68ypRR7xl5bC6NGwJrqFx4IF4T1ASUny74lI01LSbxz1/Tu2yCP+ceM2Jf1qa9aEchGRuGuRiX/hwvqVi4jESYtM/L1r37RuK+Ui0vw1dr/dihUruOWWW+r9vaFDh7JixYp6f++ss87i4Ycfrvf30qFFJv4JEyA/f/Oy/PxQLiLZp7rfbsECcN/Ub9eQ5J8s8VdUVNT5vSlTprDDDjts+4abgRaZ+EtKYOJE6NQpvC8sDO/VsSvSfB1yyJaP6rx82WWJ++3GjAmvly7d8rtbc+mll/LRRx/Rr18/Bg4cyJAhQzjuuOPYa6+9ADj++OPZf//92XvvvZk4ceLG7xUXF7N06VLKysrYc889+elPf8ree+/NkUceydq1a1Pa16lTp9K/f3/69u3Lj3/8Y9avX78xpr322ot9992XX/7ylwA89NBD7LPPPuy3335873vfS2n9W9MiR/VASPKtWsGIEfDUU7DPPpmOSES21eIkd0Betmzb13n99dcze/Zs3nrrLaZPn84Pf/hDZs+evXEs/B133EHnzp1Zu3YtAwcO5MQTT6RLly6brWP+/Pncd9993H777Zxyyik88sgjnH766XVud926dZx11llMnTqV3XbbjTPPPJNbb72VM844g8cee4y5c+diZhubk6699lqeeeYZevbsuU1NTIm02MQPoR1w8GCoqsp0JCKyNdOnJ/+sd+/QvFNbUVF47tq17u+n4oADDtjsAqg//elPPPbYYwAsWrSI+fPnb5H4+/TpQ79+/QDYf//9KSsr2+p2PvjgA/r06cNuu+0GwMiRI7n55pv5+c9/Ttu2bRk1ahTDhg1j2LBhABx00EGcddZZnHLKKZxwwgkN28lIi2zqqXbAAfDKK7DvvpmOREQaoin67bbbbruNr6dPn87zzz/PK6+8wttvv03//v0TXiDVpk2bja9zc3O32j9Ql7y8PF5//XVOOukknnjiCY4++mgAbrvtNsaPH8+iRYvYf//9WdaQ05zqbTV4DSIiaVbdPzduXBiW3bt3SPoN6bfr0KEDq1atSvjZypUr6dSpE/n5+cydO5dXX3112zdUy+67705ZWRkffvghu+yyC5MmTeL73/8+q1evZs2aNQwdOpSDDjqInXfeGYCPPvqIQYMGMWjQIJ566ikWLVq0xZlHfbXoxO8OAwfCiSeGziERyV4lJY07QKNLly4cdNBB7LPPPrRr146CgoKNnx199NHcdttt7Lnnnuy+++4MHjy40bbbtm1b7rzzTk4++WQqKioYOHAg55xzDl999RXDhw9n3bp1uDs33HADAGPHjmX+/Pm4O4cffjj77bdfg2PIiputDxgwwLf1Dlw9e8JRR8EddzRyUCLSIHPmzGHPPffMdBgtRqK/p5nNdPcBtZdt0W38AAUFsGRJpqMQEWk+WnRTD0D37vDll5mOQkTi4vzzz+fll1/erGzMmDGcffbZGYpoSy0+8RcUwJw5mY5CROLi5ptvznQIW9XiE/+BB2ocv4hITS2+jf+cc2DSpExHISLSfLT4xC8iIptr8Yn/pZegSxeo1dciIhJbLT7xt28PX32lkT0iWS/DN9Ju37590s/KysrYJ4tmgmzxnbvVF+Mp8YtkMd1Iu1G1+MTfrVt4VuIXacYuugjeeiv556++CtGc9RutWQOjRsHttyf+Tr9+cOONSVd56aWX0qtXL84//3wArr76avLy8pg2bRrLly/n22+/Zfz48QwfPrw+e8K6des499xzmTFjBnl5edxwww0ceuihvPfee5x99tls2LCBqqoqHnnkEXbaaSdOOeUUFi9eTGVlJVdeeSUjRoyo1/a2RYtP/Hl5oY1fV++KZLHaSX9r5SkYMWIEF1100cbE/+CDD/LMM89w4YUX0rFjR5YuXcrgwYM57rjjMLOU13vzzTdjZrz77rvMnTuXI488knnz5nHbbbcxZswYSkpK2LBhA5WVlUyZMoWddtqJJ598EgiTwzWFFp/4Ac4+G6Kb6ohIc1THkTkQ2vSTTci/jRPx9+/fnyVLlvDZZ59RXl5Op06d2HHHHfnFL37Biy++SE5ODp9++ilffvklO+64Y8rrfemll7jgggsA2GOPPSgqKmLevHkceOCBTJgwgcWLF3PCCSew66670rdvXy6++GIuueQShg0bxpAhQ7ZpX+qrxXfuAvzudyH5i0iWStOE/CeffDIPP/wwDzzwACNGjKC0tJTy8nJmzpzJW2+9RUFBQcJ5+LfFaaedxj//+U/atWvH0KFDeeGFF9htt92YNWsWffv25YorruDaa69tlG1tTdoTv5nlmtmbZvZE9L6Pmb1mZh+a2QNm1jrdMUCDzghFJNOqb6RdVARm4bkRbqQ9YsQI7r//fh5++GFOPvlkVq5cSffu3WnVqhXTpk1jQaKzjK0YMmQIpdGIo3nz5rFw4UJ23313Pv74Y3beeWcuvPBChg8fzjvvvMNnn31Gfn4+p59+OmPHjmXWrFkN2p9UNcUR/xig5mw5vwH+6O67AMuBUekOYOzYMFmbiGSxkhIoKwtzsJSVNcponr333ptVq1bRs2dPevToQUlJCTNmzKBv377cc8897LHHHvVe53nnnUdVVRV9+/ZlxIgR3HXXXbRp04YHH3yQffbZh379+jF79mzOPPNM3n33XQ444AD69evHNddcwxVXXNHgfUpFWufjN7NC4G5gAvA/wLFAObCju1eY2YHA1e5+VF3rach8/ADXXRfu3LN2LbRtu82rEZFGpPn4G1dzmo//RuBXQPU0aV2AFe5efWPKxUDPRF80s9FmNsPMZpSXlzcoiOqjfY3sERFJ46geMxsGLHH3mWZ2SH2/7+4TgYkQjvgbEkvNi7h6927ImkQkzt59913OOOOMzcratGnDa6+9lqGItk06h3MeBBxnZkOBtkBH4CZgBzPLi476C4FP0xgDsOmIXxdxiTQv7l6vMfKZ1rdvX96q60KzDKlvk33amnrc/TJ3L3T3YuC/gRfcvQSYBpwULTYS+Ee6YqjWp0/o4C0qSveWRCRVbdu2ZdmyZfVOWrI5d2fZsmW0rUcHZiYu4LoEuN/MxgNvAn9P9wa7d4ff/jbdWxGR+igsLGTx4sU0tA9PQiVaWFiY8vJNkvjdfTowPXr9MXBAU2y3plWrYMOGMH2DiGReq1at6NOnT6bDiKVYXLkL0L8/RFdRi4jEWmwSf/fu6twVEYEYJf6CAiV+ERGIWeLXBVwiIjFK/N27w9KlUFGx9WVFRFqyWMzHDzB0aBjRU1kZbs4iIhJXsUmBgweHh4hI3MWmqWfDBpgzB5Yvz3QkIiKZFZvEX1YWbr84ZUqmIxERyazYJH5N1CYiEsQm8W+/PbRurSGdIiKxSfxmunpXRARilPhBV++KiECMhnMCXH015OdnOgoRkcyKVeIfNizTEYiIZF6smno++wyefx50wx8RibNYJf7774cjjoCVKzMdiYhI5sQq8RcUhGd18IpInCnxi4jETKwSv67eFRGJWeLXEb+ISMyGc3btCv/4B/Trl+lIREQyJ1aJPzcXjjsu01GIiGRWrJp6AF56CV54IdNRiIhkTqyO+AGuugrWroX//CfTkYiIZEbsjvg1UZuIxF0sE7/m5BeROItd4u/eHVavhjVrMh2JiEhmxC7xV4/l11G/iMRV7BL/sGEwYwb06JHpSEREMiN2o3q6d980dYOISBzF7oh/3Tq4/XZ4881MRyIikhmxS/wAo0fDlCmZjkJEJDNil/jbtoXtt1fnrojEV+wSP4Q2fl3EJSJxFcvEr6t3RSTO0pb4zaytmb1uZm+b2Xtmdk1U3sfMXjOzD83sATNrna4YktHVuyISZ+k84l8PHObu+wH9gKPNbDDwG+CP7r4LsBwYlcYYErrpJs3QKSLxlbbE78Hq6G2r6OHAYcDDUfndwPHpiiGZnj03XcErIhI3aW3jN7NcM3sLWAI8B3wErHD3imiRxUDPJN8dbWYzzGxGeXl5o8b13ntw9dWwfHmjrlZEJCukNfG7e6W79wMKgQOAPerx3YnuPsDdB3Tr1q1R4/rgA7jmGliwoFFXKyKSFZpkVI+7rwCmAQcCO5hZ9VQRhcCnTRFDTbrpuojEWTpH9XQzsx2i1+2AI4A5hArgpGixkcA/0hVDMtVz9Whkj4jEUTonaesB3G1muYQK5kF3f8LM3gfuN7PxwJvA39MYQ0I64heROEtb4nf3d4D+Cco/JrT3Z0yHDtCmjY74RSSeYjctM4AZfP55mLNHRCRuYpn4ATp1ynQEIiKZEcu5egAmTQpDOkVE4ia2iX/6dJg4MdNRiIg0vdgm/u7dQ+duVVWmIxERaVqxTfwFBVBRoWkbRCR+Yp34QUM6RSR+Ypv4u3cPY/l1xC8icRPb4ZyHHQZr14Yx/SIicRLbxK+ELyJxFdumHoBRo+CeezIdhYhI04p14p8yBV56KdNRiIg0rVgn/u7dNUOniMRPrBN/QYGGc4pI/MQ68euIX0TiKNaJv0+fMDe/iEicpJT4zWyMmXW04O9mNsvMjkx3cOn2v/8Lb7+d6ShERJpWqkf8P3b3r4EjgU7AGcD1aYtKRETSJtXEX32501Bgkru/V6Msa739NvzgBzrqF5F4STXxzzSzZwmJ/xkz6wBk/YTGFRUwdSqUlWU6EhGRppPqlA2jgH7Ax+6+xsw6A2enLaomUj1Dp0b2iEicpHrEfyDwgbuvMLPTgSuAlekLq2l07x6elfhFJE5STfy3AmvMbD/gYuAjIOtnuWndGnbYQYlfROIl1cRf4e4ODAf+4u43Ay1iBPx3vwtdumQ6ChGRppNqG/8qM7uMMIxziJnlAK3SF1bTefLJTEcgItK0Uj3iHwGsJ4zn/wIoBH6XtqhERCRtUkr8UbIvBbY3s2HAOnfP+jZ+gBtvhEGDMh2FiEjTSXXKhlOA14GTgVOA18zspHQG1lS+/hpefx02bMh0JCIiTSPVNv5xwEB3XwJgZt2A54GH0xVYU6key79kCRQWZjYWEZGmkGobf0510o8sq8d3m7WaiV9EJA5SPeJ/2syeAe6L3o8ApqQnpKalq3dFJG5SSvzuPtbMTgQOioomuvtj6Qur6RQWhonattsu05GIiDSNVI/4cfdHgEfSGEtG9OoFzz2X6ShERJpOnYnfzFYBnugjwN29Y1qiEhGRtKmzg9bdO7h7xwSPDs0+6ZeWQnEx5OSE59LSpIseeij87GdNFpmISEalbWSOmfUys2lm9r6ZvWdmY6Lyzmb2nJnNj547NfrGS0th9GhYsADcw/Po0UmT/9q18MknjR6FiEizlM4hmRXAxe6+FzAYON/M9gIuBaa6+67A1Oh94xo3Dtas2bxszZpQnkBBgYZzikh8pC3xu/vn7j4rer0KmAP0JMzweXe02N3A8Y2+8YUL61XevbuGc4pIfDTJRVhmVgz0B14DCtz98+ijL4CCRt9g7971Ki8ogPJyqMr6m0mKiGxd2hO/mbUnDAO9yN2/rvlZNMd/olFDmNloM5thZjPKy8vrt9EJEyA/v/YK4aqrEi6+di20bQt5eVvtBxYRyXppTfxm1oqQ9Evd/dGo+Esz6xF93gNI2Lru7hPdfYC7D+jWrVv9NlxSAhMnQlFRSPjduoVO3mnTwnMNpaVw223wzTcp9QOLiGS9dI7qMeDvwBx3v6HGR/8ERkavRwL/SEsAJSVQVhbab5YsgauvhkmT4JZbNlusnv3AIiJZz9wTtrQ0fMVmBwP/Bt4FqlvPLye08z8I9AYWAKe4+1d1rWvAgAE+Y8aMhgVUVQXDh8PTT8P06XBQmH0iJ2eLk4AofrX5i0h2M7OZ7j6gdnnKUzbUl7u/RLjCN5HD07XdpHJywhH/wIFw0kkwaxb06EHv3qF5p7Zk/cMiItmuRUytnLIddoDHHgt3Xzn5ZNiwIWE/cG4ujB+fkQhFRNIuXokfYJ994I474OWX4eKLt+gH7tQJKivh008zHaiISHqkramnWRsxAt54A/7wBxg4kJIzz6SkJHzkDr/5DZx+emZDFBFJl/gd8Ve7/no45JAwO9uECRsndLM+xVzaq5SePcORf1lZhuMUEWlk8U38eXnwwAPQrh1ceWXCCd3OPz8M/vnss0wHKyLSeOKb+CFM0tOmzZbjOaOB/OedBytXwvHHh6t7RURagngnfkg+O9vChey7L0yeHLoDfvKTxOP9RUSyjRL/ViZ0O/740AVw772h01dEJNsp8ScayJ+Xt9lA/ssugzFjYP36lG/qJSLSbCnx1x7I37EjVFSEcf5R245ZuOD3t79N+aZeIiLNlhI/bD6h24oVcMklYcrOCy7YmPw1mZuItBTxvICrLmbw61+HQfy//32Yv+HGG1m4MPG0Q8lu9iUi0lwp8SdiFtp1KirgxhshN5fevf7AggTJv0ePpg9PRKQhlPiTMYMbbgjNP3/8I1OG5jKw/LesWbt58l+/PjT51O4fFhFprpT462IWjvgrK9nr5t8zt99c7N132KlyEZ/l9mbmCRNYPbxESV9EsooS/9aYwZ//DHPn0mvqExuLCysXUPjkaBgOUMLUqdCnD+y8c8YiFRFJiUb1pMIM5s/fsjwa1rNuHZx5Jhx6KHzySdOHJyJSH0r8qVq0KHH5woW0bQtPPAGrVoXkrxk9RaQ5U+JPVbKpHXr1AqB/f3j++XBzr0MOCV0DuspXRJojJf5UJZraAaBz541Xdn3nOyH5L1kCY8fqKl8RaZ6U+FNVe2qHoiI4+2x4+2046qhwxS8h+XfqFC4BqElX+YpIc2GeBXMNDxgwwGfMmJHpMBJ74AE44wzYe294+mkoKCAnJ/EUzmbhsgARkaZgZjPdfUDtch3xN9SIEfCvf8G8eXDwwVBWtrWZnkVEMkqJvzEcdVRo3F+6FA46iD+f937C7oATT2z60EREalPibywHHggvvghVVRz7myH85+hrWZRbTCU5LMotZkzXUm66CaZMyXSgIhJ3unK3MfXtG+bxHzyY/R69amNxYeUC/vjNaL7zAzj44JIMBigioiP+xrfzztC69RbFtnYNZ84dR8eOYYTPrbfqHr4ikhlK/Onw2WeJy6PJ+ydNgvPOCzdwrz3sU0Qk3ZT40yHZ8J2ePYFwMddVV8Edd8CgQWFxXeErIk1FiT8dkl3l++23MGcOZnD11WFit1mzwjRAusJXRJqKEn86JLrK98orQ3YfNCjM6Ab83/9t+VVd4Ssi6aYrd5vSwoXwox/Bm2/ChAnkXH4pzpa3c9QVviLSGHTlbnPQuzf8+9/hat/LL+fx/NMYyZ18Qhjv/wnFnEopHTsq8YtI+ijxN7X8fLj3Xvj1rzl2zf3cwSiKWUAOTjELuJ3RDF1ZyrHHwvLlmQ5WRFoiJf5MMINLL8W6dSOHzZvatmMNf+08jueegwEDwuSfIiKNSYk/k5YuTVjcYflCXnwRKith5comjklEWry0JX4zu8PMlpjZ7Bplnc3sOTObHz13Stf2s0Ky8f4FBQweHCb8/N73QtGUKXDPPbqrl4g0XDqP+O8Cjq5Vdikw1d13BaZG7+Mr0Xh/M/jiC7jsMlr7egDeew9++MNw3xfd1UtEGiptid/dXwS+qlU8HLg7en03cHy6tp8VEo33nzgRRo2C66+H/feHGTPYe2/o2nXLkT4a8y8i26Kp2/gL3P3z6PUXQEETb7/5KSmBsrKQ1cvKwgQ+f/tbaNtZvhwGD4Yrr+TrpRs4ldIthn5G0/+IiKQsY9Myu7ubWdKrx8xsNDAaoHccb111zDEwezb84hcwfjwf59xN56py2rEOYOPQz66doKqqhBx104tIipo6XXxpZj0AouclyRZ094nuPsDdB3Tr1q3JAmxWOnWCu+6Cf/2LHv7pxqRfbTvWcNWGcQwZEuoIEZFUNHXi/ycwMno9EvhHE28/Ow0btsV4/2qdv1nIBx9A//6hvX/t2iaOTUSyTjqHc94HvALsbmaLzWwUcD1whJnNB34QvZdUJGnuss6dmTu7gtNOg+uug333DfPBadiniCSjSdqyRWlpGL+5Zs2mMrMwtnPXXeGaa5jadQSnnZ7DypWwfv2mxfLzw2ChEt31USRWNElbtks09HPSJHj8cWjbFk47jcN/sS/HVz3KCes3H/0zfE2phn2KyEY64m8JqqrgoYfCbb0++IBKjNwafQLfkM9Pmch1n5RQXJy5MEWkaemIvyXLyQlTPc+ezVc5XTZL+hBG/1zHOHbZJdzkXUTiTYm/JcnLo1NV7YulgyIW8Jfhz3LgAZUAfPwxzJwZug7UESwSL0r8LYwVJRn9k5PDOY8eRb8f9YErr+Rvl33EgAEw5fRSpi8opsJzmL6gmOfPLlXyF2nh1Mbf0iQa/ZOfD7fcEp7vuAOefRaqqphre9LHP6INGzYu+g35XNZlIn9aqiFAItlObfxxkWzit5Ej4eST4amnwtSeEybwXz5/s6QPoT/gf5aN49574euvN5WrSUik5dARf4xVWU7CK4IdmMA4prYeyk4/GkTP3rmU31jKVd+OozcLWUhvrmk1gR/cWaJrA0SasWRH/Er8Mba6azHtly3Yorwyrw05XoFVVvKVdWaO784AZqpJSCTLqKlHttD+pglUtN78RjAVrfPJvevvWHk5PPAA25ccy2BeS9gk9Ktll7Bs6ZYHDi+dV8rivGKqLIfFecW8dJ7ahUSaEx3xx11paZjdbeHCMB/QhAlbzO2QrEkI4HN2ZG7n75I75Lt875Lv8p+75rHfxPPYjk2dy9+Qz5vnTuTgW3R2INKU1NQj2yxZk9Catp2Yv+tQus77Dz3XfwKE/gFLsI7FuUUUVpRtXphCpSMi2y5Z4s/YjVgke7S/aQIVPx5N3oZNR/EVrfPJ/9uf2S9K1JWffkHu66/ACSckXEfPygVwxhmw335hCtEPP6TiF2M3rXPBgrANUPIXSTO18cvWlZSQd8fmQ0Tz7th8us/cnjvCj37Ep7lFCVexjrYwbRqMHQtHHQXnn79ZRQKQt2ENay64ZMubC2ssqUijUuKX1NS+N3CSo/Ky0RP4hs07jL8hnxnn/A0WL+alx5dyVu8XkvQYQP7yT/H27cOZwcknw/DhVJ7143DtgfvGM4OkyV+VhMhWqY1fGt1L55VSPHEcO1Uu5LPc3pSNnrBFx26ZFVPMlv0Gy+jMJBvJd9rPYzebR8HX8xP2GVS0akfeuT+FPn1Cgi8uhjfegIsu2vKq5UQ3I1D/gsRAsjZ+3L3ZP/bff3+XluWCLpN9Nfnu4TjeHXw1+X5Ox8k+bpz7Mce4FxS4V2KbLVP9qAKvaLddws+2eHTp4v788+5z5rivWuU+ebJ7/ubb9vz8UJ7I5MnuRUXuZuE52XIizQwwwxPk1Iwn9VQeSvwtz+TJ7me1muyfUOSVmH9CkZ/VavIWOfUTihIm808ocqjyrrbUzz1ghvvDD3tVKpUAhASeqLxrV/dp09zff9/9q6/cq6rqV0mogpBmRolfmp1U8mSyM4Nzt5/s993nfuWV7tddF5ZdlFuUMKF/aju5T5/uPnmyL/3Vb1KvIFq3ds/NTfxZly7u//qX+6uvun/4ofvtt6fnLEKViTSAEr9kpVTPDNzdTyNxJXEqYeGKipDLk51FfJHTw1c8OtWrJpe633CD+69+lVoFUddju+3cx451//Wv3f/6V/eHHnK/7DL3tm23Xkmkq0lKlUlsKPFL1ko1TxUVuZ/K5pXEqUz2oqLw+YYN7pMmhWUSVRCnMdnBvX1797593Y891v2LtkUJE/o3nXZyf+019yefdL/rrrqTf5s2qVUSubnu/fq5H3KI+/DhodJItFz37u7/+Y/77Nnuixa5r1wZdiyVSkKVSawo8UuLl2pOS1ZBFBa6//GP7hde6H7cce777pv8LOJ0m+zf/7776ae7z5jhvqpLUcIkvapLUegr+OYb98WL3d95J3kfA7gPG+Y+ZEioeRp6tgHu7dq5//d/u//kJ+4XXeS+/faJlysocJ850/2DD0Kcy5e733NP9lQmqqASUuKXWEjl/3V98pRZ4koC3A8+2L242P2FF5L3RZyRO9nfeSesa86ccHLwVcei5JVETUWJl/OCAvennnJ/4IHQt/CHP9Sd/HfbzX2nndw7dmycyqR1a/dDD3UfOtT9xBOTn5l06uT+t7+533uv+2OPuT/9tPsVV2zZzNWuXdiP9etDJVnfHynVZWNYQSnxi9RQn+ajRDmtuvmoWl0VxIcfhmVuuil8N1lT0886TPZRo9wvvzwse+cRiZf797kJgk010N69Ey/Xvbv744+7l5aGvogbbqg7+Q8Z4j5ggPteezVOZVLz0bZt8rOiNm3cjzwynJKdcor7yJGhbS5ZxfOXv7hPnBhq3K5dEy+3447hbOfdd93nznX/+GP3P/85VEiNWZmkq4KqgxK/yDaoT/PR1vLu2rWhEqirkujRY/OBRImWa9XK/fDDQwvOhReGg+R/n5taJZHqcinvVF3LFRa6L1gQkumsWe4vv1x3M9f48WGY1tixdVcMgwa577ef++67J992Oh+tW4eznO23d8/JSbxMq1ahYhw8OFSShx225ZlO9WO77dxHjXL/2c/czz/fvUOH1P7uKVDiF9lGjd18tLV8WlnpXl5ed4488ED3XXbZ9J1k/RZ5eZu2O358yDHJOsDLy0NXRHVrS8qVxOTJ/m3rzZf7tvU27nx9l6tr2cJC9y+/DB3gH38cmrsSLVd9tvPQQ+GM58476078l1zifvHFodata7ljjnE/4ojQWX/wwXUv27NnaMLr0iX5MmZb7vtWKPGLpFl9mm8b6yyiWl2VRLWzz647pwwYEF63aRPOPFq1SlxJdOoUBhE98URoHanPkNu0VCbpaEJpygqqIevcCiV+kWakqc8iUlnu/vvdr78+XL7wk58kryRqPk48Mfk6W7cO66t22WWhdSRRZdK7t/uyZWHIbfW+p1qZuIcKZVFuWHZRblHipqt6LJeWykRt/Er8Iqlo7LOIxqhMCgvd580LlzHMnl332cbvfx/WtWFD8ibu6jOO6tft2iVvOu/QIXR+X3+9+803h+1PnrxlP2y7duGsZFv33T0NlUl9ltWoHiV+kVQ09kjBdDRJ9eqVeNlevcJIpmuvdf/lL5NXELB55/ett9bdv1ud/GfODE1XyeL85JNQkdx9t/sjj4Rm/NrX3TW0MsnEtE9K/CJSb43dJNUYlUlVlfuaNe5ffOH+9dd1n3G8/XZY31tvJV/GzP3RR+uubGrH2727e58+oS8k2ZnJZZeF5iv3cN1esn7boqIw4mvt2vrPC7g1SvwikjaNfQ1TU/dvrF8fKpIPPwyVRF2VycUXu59zTrhqu67lcnPDgCL3cCZTV8VzySWbXidb5zb07Srxi0h2aY79G/WpTKqqNg2NXb48+WjSoqIweeyvfx0uY6irgqgvJX4RabEy1b/R1M1c9aXELyJSD5maqqcp2vh1z10RkWamsW4JneyeuzmNEeQ2BHO0mX1gZh+a2aWZiEFEpLkqKYGyMqiqCs/bkvTr0uSJ38xygZuBY4C9gFPNbK+mjkNEJK4yccR/APChu3/s7huA+4HhGYhDRCSWMpH4ewKLarxfHJVtxsxGm9kMM5tRXl7eZMGJiLR0GWnjT4W7T3T3Ae4+oFu3bpkOR0SkxchE4v8U6FXjfWFUJiIiTaDJh3OaWR4wDzickPDfAE5z9/fq+E45sKBWcVdgabrizICWtj/Q8vZJ+9P8tbR9auj+FLn7Fk0meQ1Y4TZx9woz+znwDJAL3FFX0o++s0XgZjYj0fjUbNXS9gda3j5pf5q/lrZP6dqfJk/8AO4+BZiSiW2LiMRds+3cFRGR9MjmxD8x0wE0spa2P9Dy9kn70/y1tH1Ky/5kxVw9IiLSeLL5iF9ERLaBEr+ISMxkXeJviTN7mlmZmb1rZm+ZWdbNP21md5jZEjObXaOss5k9Z2bzo+dOmYyxvpLs09Vm9mn0O71lZkMzGWN9mFkvM5tmZu+b2XtmNiYqz8rfqY79yebfqK2ZvW5mb0f7dE1U3sfMXoty3gNm1rrB28qmNv5oZs95wBGEOX7eAE519/czGlgDmVkZMMDds/LCEzP7HrAauMfd94nKfgt85e7XRxV0J3e/JJNx1keSfboaWO3uv89kbNvCzHoAPdx9lpl1AGYCxwNnkYW/Ux37cwrZ+xsZsJ27rzazVsBLwBjgf4BH3f1+M7sNeNvdb23ItrLtiF8zezZD7v4i8FWt4uHA3dHruwn/KbNGkn3KWu7+ubvPil6vAuYQJkfMyt+pjv3JWtFNs1ZHb1tFDwcOAx6OyhvlN8q2xJ/SzJ5ZyIFnzWymmY3OdDCNpMDdP49efwEUZDKYRvRzM3snagrKimaR2sysGOgPvEYL+J1q7Q9k8W9kZrlm9hawBHgO+AhY4e4V0SKNkvOyLfG3VAe7+3cIN6c5P2pmaDGie39mT5ticrcC/wX0Az4H/pDRaLaBmbUHHgEucveva36Wjb9Tgv3J6t/I3SvdvR9h8soDgD3SsZ1sS/wtcmZPd/80el4CPEb4wbPdl1E7bHV77JIMx9Ng7v5l9B+zCridLPudonbjR4BSd380Ks7a3ynR/mT7b1TN3VcA04ADgR2iyS2hkXJetiX+N4Bdo17u1sB/A//McEwNYmbbRZ1TmNl2wJHA7Lq/lRX+CYyMXo8E/pHBWBpFdYKM/Igs+p2ijsO/A3Pc/YYaH2Xl75Rsf7L8N+pmZjtEr9sRBrHMIVQAJ0WLNcpvlFWjegCi4Vk3smlmzwmZjahhzGxnwlE+hEnz7s22fTKz+4BDCFPIfglcBTwOPAj0JkypfYq7Z01naZJ9OoTQhOBAGfCzGu3jzZqZHQz8G3gXqIqKLye0i2fd71TH/pxK9v5G+xI6b3MJB+UPuvu1UY64H+gMvAmc7u7rG7StbEv8IiLSMNnW1CMiIg2kxC8iEjNK/CIiMaPELyISM0r8IiIxo8QvsWVmlTVmcXyrMWd7NbPimjN7ijQnGbnZukgzsTa6PF4kVnTEL1JLdH+E30b3SHjdzHaJyovN7IVoArCpZtY7Ki8ws8eiedTfNrPvRqvKNbPbo7nVn42uxsTMLozmkX/HzO7P0G5KjCnxS5y1q9XUM6LGZyvdvS/wF8KV4gB/Bu52932BUuBPUfmfgP9z9/2A7wDvReW7Aje7+97ACuDEqPxSoH+0nnPSs2siyenKXYktM1vt7u0TlJcBh7n7x9FEYF+4exczW0q4+ce3Ufnn7t7VzMqBwpqX0UdTBT/n7rtG7y8BWrn7eDN7mnCTl8eBx2vMwS7SJHTEL5KYJ3ldHzXnU6lkU5/aD4GbCWcHb9SYeVGkSSjxiyQ2osbzK9Hr/xBmhAUoIUwSBjAVOBc23khj+2QrNbMcoJe7TwMuAbYHtjjrEEknHWlInLWL7nZU7Wl3rx7S2cnM3iEctZ8alV0A3GlmY4Fy4OyofAww0cxGEY7szyXcBCSRXGByVDkY8Kdo7nWRJqM2fpFaojb+Ae6+NNOxiKSDmnpERGJGR/wiIjGjI34RkZhR4hcRiRklfhGRmFHiFxGJGSV+EZGY+f/UKXXShjy7UgAAAABJRU5ErkJggg==\n"
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": "<Figure size 432x288 with 1 Axes>",
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEWCAYAAABxMXBSAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAA27UlEQVR4nO3deXxU1f3/8deHsCMCAm5AElQsslv54VKrVL9atFVcWhWjFWvFvdqqFUtdqtJqFxfqilVwQRF3WrdqBVsVrVBQEZciEAigRhRF9pDP749zA5PJTDKTZDKZ5P18POYxd+49995zZ+B+cs655xxzd0RERFLVItsZEBGR3KLAISIiaVHgEBGRtChwiIhIWhQ4REQkLQocIiKSFgUOaRTM7DkzO62+02aTmS0xs//LwHHdzPaIlu80sytSSVuL8xSZ2T9qm09pukz9OKS2zOybmI/tgY3AlujzWe4+peFz1XiY2RLgZ+7+Uj0f14E+7r6wvtKaWSGwGGjl7mX1klFpslpmOwOSu9x9u4rl6m6SZtZSNyORpkNVVVLvzGy4mZWY2WVm9gkwycy6mNnfzazUzL6MlnvG7DPTzH4WLY82s1fN7E9R2sVmdkQt0/Y2s3+Z2Roze8nMbjOzB5PkO5U8Xmtmr0XH+4eZdYvZfqqZFZvZKjMbV833s6+ZfWJmeTHrjjWzd6LlYWY2y8xWm9lKM7vVzFonOdZkM7su5vOl0T4rzOyncWl/YGZzzexrM1tmZlfHbP5X9L7azL4xs/0rvtuY/Q8ws7fM7Kvo/YBUv5s0v+cdzGxSdA1fmtlTMdtGmtm86Bo+NrMRyb5nyRwFDsmUnYEdgAJgDOHf2qTocz6wHri1mv33BT4EugF/AO4xM6tF2oeA/wBdgauBU6s5Zyp5PBk4HdgRaA1cAmBm/YA7ouPvGp2vJwm4+5vAWuCQuOM+FC1vAX4RXc/+wKHAudXkmygPI6L8HAb0AeLbV9YCPwE6Az8AzjGzY6JtB0Xvnd19O3efFXfsHYBngAnRtd0IPGNmXeOuocp3k0BN3/MDhKrP/tGxboryMAy4H7g0uoaDgCVJziGZ5O566VXnF+E/8P9Fy8OBTUDbatIPAb6M+TyTUNUFMBpYGLOtPeDAzumkJdyUyoD2MdsfBB5M8ZoS5fE3MZ/PBZ6Plq8EpsZs6xB9B/+X5NjXAfdGyx0JN/WCJGkvAp6M+ezAHtHyZOC6aPle4PqYdHvGpk1w3JuBm6Llwihty5jto4FXo+VTgf/E7T8LGF3Td5PO9wzsApQDXRKku6siv3pl96USh2RKqbtvqPhgZu3N7K6oKudrQtVI59jqmjifVCy4+7pocbs00+4KfBGzDmBZsgynmMdPYpbXxeRp19hju/taYFWycxFKF8eZWRvgOOC/7l4c5WPPqPrmkygfvyOUPmpSKQ9Acdz17WtmM6Iqoq+As1M8bsWxi+PWFQM9Yj4n+24qqeF77kX4zb5MsGsv4OMU8ysZpMAhmRL/uN7FwLeAfd19e7ZVjSSrfqoPK4EdzKx9zLpe1aSvSx5Xxh47OmfXZIndfQHhxnsElaupIFR5fUB4Gmp74Ne1yQOhxBXrIWA60MvdOwF3xhy3pscrVxCqlmLlA8tTyFe86r7nZYTfrHOC/ZYBu9fifFLPFDikoXQk1GWvjurLr8r0CaO/4GcDV5tZazPbHzgqQ3l8DPihmR0YNWRfQ83/vx4CLiTcOB+Ny8fXwDdm1hc4J8U8TANGm1m/KHDF578j4a/5DVF7wckx20oJVUS7JTn2s8CeZnaymbU0sxOBfsDfU8xbfD4Sfs/uvhJ4Drg9akRvZWYVgeUe4HQzO9TMWphZj+j7kQamwCEN5WagHfA58AbwfAOdt4jQwLyK0K7wCKG/SSI3U8s8uvt7wHmEYLAS+BIoqWG3h4GDgZfd/fOY9ZcQbuprgLujPKeSh+eia3gZWBi9xzoXuMbM1hDaZKbF7LsOGA+8Fj3NtV/csVcBPySUFlYBvwJ+GJfvVN1M9d/zqcBmQqnrM0IbD+7+H0Lj+03AV8ArVC0FSQNQB0BpVszsEeADd894iUekqVKJQ5o0M/t/ZrZ7VLUxAhgJPJXlbInkNPUcl6ZuZ+AJQkN1CXCOu8/NbpZEcpuqqkREJC2qqhIRkbQ0i6qqbt26eWFhYbazISKSU+bMmfO5u3ePX98sAkdhYSGzZ8/OdjZERHKKmcWPFgCoqkpERNKkwCEiImlR4BARkbQ0izaORDZv3kxJSQkbNmyoObFU0bZtW3r27EmrVq2ynRURaWDNNnCUlJTQsWNHCgsLST4/kCTi7qxatYqSkhJ69+6d7eyISANrtlVVGzZsoGvXrgoatWBmdO3aVaU1kUZqyhQoLIQWLcL7lCn1e/xmGzgABY060Hcn0vBSCQhTpsCYMVBcDO7hfcyY+g0ezTpwiIhkW6qlg5oCwqZN4X3cOFi3rvK+69aF9fVFgUNEJAPqq3Swbh0sXAiXXJI4IIweDZ06wcCBYd3SpYnzk2x9bShwpKi+6wxXr17N7bffnvZ+Rx55JKtXr67byUUko1IJCFu2wGWXJQ4G558PAwZAly7QoQP06QOffEJCZWVw2mlwTjRPZH78hMFUv742FDhSkIk6w2SBo6ysrNr9nn32WTp37lz7E4tIraX6B+TllycOCGPHhuUbboDWrWF5khnbV6+GPfeEoiL43e9g8mTYccfEaQsKYMIEuOii8Hn8eGjfvnKa9u3D+nrj7k3+tc8++3i8BQsWVPp88MFVX7fdFrb16uUeQkblV9euYXtpadV9a3LiiSd627ZtffDgwT506FA/8MAD/aijjvI+ffq4u/vIkSP929/+tvfr18/vuuuurfsVFBR4aWmpL1682Pv27es/+9nPvF+/fn7YYYf5unXrkp5v4sSJPnToUB80aJAfd9xxvnbtWnd3/+STT/yYY47xQYMG+aBBg/y1115zd/f77rvPBw4c6IMGDfJTTjkl4THjv0ORXPXgg+4FBe5m4f3BBxOnad++8j2gdWv3kSPdTzvN/dBD3d9/P6RNdL+AcHx395dfdv/Nb9x32CFxuoKC1M7fvn3yvNZ0PakAZnuCe2rWb+oN8apr4DBL/g/BvXaBY/Hixd6/f393d58xY4a3b9/eFy1atHX7qlWr3N193bp13r9/f//888/dvXLgyMvL87lz57q7+49//GN/4IEHkp6vYn9393HjxvmECRPc3f2EE07wm266yd3dy8rKfPXq1T5//nzv06ePl5aWVspLPAUOacxSvXkmuiG3axeCwWWXuRcVhf/TLVsmvw/06uW+//7us2eHY+66a2oBIZ1gkM411RcFjjjp3PQKClL7R5CO+MAxfPjwStuvuuqqraWA7bff3mfNmhXlZVvg2GOPPbamv/766/3aa69Ner6ZM2f6gQce6AMGDPDCwkI/66yz3N29W7duvmHDhkppJ0yY4L/+9a9rvAYFDsmG2pYO2rVzryi8r1rlfvnl7qec4t6mTfKA0KqVe2Gh+4EHJk9TUYqo6fyZLh1kQrLA0Wx7jqdj/PjQphFbZ1nfdYYdOnTYujxz5kxeeuklZs2aRfv27Rk+fHjCznZt2rTZupyXl8f69euTHn/06NE89dRTDB48mMmTJzNz5sz6y7xIA6lob6z4v1jR3lhWBt/97ra2h1//umobw/r1cOmlIb0Z/OEP0KMHbNyY+FxmsGFDOCaE4xYnGGQ8UaNzUVF4HzcuPM2Unx/uFxXr49MmWt+YqXE8BUVFMHFiaIQyC+8TJ9btx+7YsSNr1qxJuO2rr76iS5cutG/fng8++IA33nij9ieKrFmzhl122YXNmzczJaZF79BDD+WOO+4AYMuWLXz11VcccsghPProo6xatQqAL774os7nF6lOTY3OGzbABx+EG3+yR1J33x2uuiqsW7Ys8Xkq/st17hwCRnFx+P+cSH7+tqAB6Tc6FxXBkiVQXh7ecy04VEeBI0X1/Y+ga9eufOc732HAgAFceumllbaNGDGCsrIy9tprL8aOHct+++1Xt5MB1157Lfvuuy/f+c536Nu379b1t9xyCzNmzGDgwIHss88+LFiwgP79+zNu3DgOPvhgBg8ezC9/+cs6n1+ap7r0ZTjqKDjwwFAqaNcO9toLVq5Mfq5Jk+DCC8NyTY+kmkFeXlhONSBk4g/InJWo/qq+XsAI4ENgITA2wfYC4J/AO8BMoGe0/nvAvJjXBuCYaNtkYHHMtiE15aOubRySmL5DqU6ydoaKOvx77nH/8Y9DO0KitoMOHUKj9OjR7r/9rfsDD7jvvHNq7Y2NvdE5V9DQjeNAHvAxsBvQGngb6BeX5lHgtGj5EOCBBMfZAfgCaO/bAseP0smLAkdm6DtsnlK9ye60U+KbfH5+2H7OOe59+jTPRudckSxwZLKqahiw0N0XufsmYCowMi5NP+DlaHlGgu0APwKec/d1CbZJnPPOO48hQ4ZUek2aNCnb2ZIcUNtqpdNPh0MPhe99L1QrlZaGtJ9+mvg8Fe0Pt90GH31UfRtDvHSqi5pyG0O2ZfKpqh5AbBNVCbBvXJq3geOAW4BjgY5m1tXdV8WkOQm4MW6/8WZ2JaGaa6y7V3kuwszGAGMA8uuzr30jd9ttt2U7C5KDkj2ttH49DBoUGqY/+ABuvjmsi7V5M8ycCfvuC4cdtm2wvV69EjdSx7YzQPpPLebiU0hNTqJiSH28CCWFv8Z8PhW4NS7NrsATwFxC8CgBOsds3wUoBVrFrTOgDXAfcGVNeVFVVWboO2z8UqmuKS9379EjeZVRxau6DnCqVmqayEJV1XKgV8znntG6rdx9hbsf5+57A+OidatjkpwAPOnum2P2WRld00ZgEqFKTETiJKpW+tnP4KabwvYVK2CffaBjx+RjJgE89VQobaxbp2qlrEtnDPZMzuSUKJrUx4tQDbYI6M22xvH+cWm6AS2i5fHANXHb3wC+F7dul+jdgJuB62vKi0ocmaHvMHuq+wu9vDy85+cnLh107Bi2b9zo/v3vu//856mPmZTu00pNTjpFo3TGPKnt2CjxX355ufukSeHxtXr4kcjGkCPAkcBHhKerxkXrrgGOjpZ/BPwvSvNXoE3MvoWEEkqLuGO+DLwLzAceBLarKR8KHJmh7zA7Et0/WrUKYyUNHOg+alRIl2yMtWZTrVTfN/l0v6RU0iZK16aN+y9/6T55svtNN7lfeaX7BReE55MT/aAtWrh36lT92Cm1HCMpK4GjsbzqJXBk+X9Lhw4dGvR8qVDgqF81lSKKi90feyx56SAvz/3II93//OewT7pjrGX1n3hjvclv2eL+9dfuK1cmH7lwhx3Cl37tte5jx1Z/k2/ZMoyI2K1b1XNW14DUuXP1aS68MIzIePXV1R8nTQoccdK66TWC8rkCR9OW7J/Yeee5//CH7jvumNr9JZVjNrqAUN83+eXLk9/ku3QJN9dLLnE/++wwymF8tU7sF5rKjT2+6FfTTX70aPdzz3W/+OLqf8yPP3b/4gv3srJw7an+JVCPo7IqcMSpdNO78MLE46pXvJIVAdu0Sb7PhRdW+4Ncdtllfuutt279fNVVV/m1117rhxxyiO+9994+YMAAf+qpp7Zury5wrFmzJul+iebVSDYHR7oUOFJT071z06bkPaI7dXLfa68wxPett7q/+Wby+WGSzeFQ76WI+qqT37LF/csvk9/kO3cOXcZ/9Sv38893P/305H+lm6V3o2/Xzr17d/fddqs+3dVXu//xj+633x4m4EmUpkePcB2bNm279kzc5OtS/ZVLbRyN5VXnwFHdP6xaBo7//ve/ftBBB239vNdee/nSpUv9q6++cnf30tJS33333b08aumsLnBs3rw54X7J5tVINAdHbShw1CzZsBv33BO2P/548j94K+6HqRyzXkoSdS0dlJWFKp05c9z//vfkdWotW4a//Fu0SO0m36ZNSN+zZ/XprrjC/Q9/cL/jjuQ3+V69tv0FXyHVm3dDtXHUx9go9fQXgwJHnLRuepmYkMPd+/bt68uXL/d58+b5AQcc4Js2bfLzzjvPBw4c6IMHD/a2bdv6ypXh6ePqAkey/ZLNq5FoDo7aUOCoWbInm7p1C9sXLAhPNXXrlt4/sZTvC3UtHdx7b2hcmTPH/fnnk9+QW7RIPRBAqIP7zW9C20B1N/nNmyvnM5s3+bS++DS/+0b6tIECR5zG0MZxxRVX+C233OKXX36533LLLT5p0iQ/4YQTfFNU5C0oKPDFixe7e/WBI9l+ChyZk+z/emmp+9/+5v766+Fzqk82pf1PrK6lgzVrwjynL74Ynt7p1Cn1m36y17hxoT7tiSfcZ81K3qswl2/yzYwCR5zG8FTV/Pnzff/99/c+ffr4ihUr/Oabb/bzzz/f3d1ffvllB1IKHMn2q6iqqpg2tqKq6sQTT1RVVR0kunfl5VVuwK54JDZZiaNO7RHV3TzLy0P0mjcv1OEni1rpBIS773Z/8kn3V19NfU7UmvJZ64tPM63UiQJHnMZy0xswYMDWaWNLS0t9v/328wEDBvjo0aO9b9++KQWO6vabPHmy9+/f3wcNGuSnnXaau4fG8aOPPtoHDBjggwcP9tcr/jxOU2P5DutTTY/E/u9/yavu27Vzv+EG91decV+7dtvxUi5JpHJDXLPGfZddEmegZcuan+WveN1wQzj+zJnuCxem3uKeqTp5aZQUOOI0xZteQ2tq32Gye+LYse4//Wny6vVk1U8V/n3Og74sr8C3YL4sr8D/fU6KVTCtW7v/4Adh0oqhQ5M3hMS+LrnE/eab3R99NPXJK6q7+JzuASh1pcARp6nd9LKhqX2HyQJDp07hydDjjnO/7bZQWzOKB30xIRgspsBH8WDihuxEN+S2bUPP4AkTwrP8xx8fgkSyYNCnj/vhh7uPGeP++9+n3pKu0oHUkQJHnFy86b3zzjs+ePDgSq9hw4ZlLT+58h3WdD/8/HP3u+4K/xsSBQSzyk9w/vucB/0bKt+Qv6G9//vsB0Lns9dfd3/44VAdtN12yQNCRRD51reSb28244NIY6TAEWfBggVb+0hI+srLy3MicCS7x151VXgU1t39rbfC+iJLHBAu6BrTYW3ZsuRT26XzMnP/9NNtIxLm1Pgg0lwkCxwWtjVtQ4cO9dmzZ1dat3jxYjp27EjXrl2xihllJCXuzqpVq1izZg29e/fOdnaqVVgYhhNP5Kyz4M47w1De778PBQcXst2qqonLWrWlZe/8cKCNVeYMq+z228P44QUFYazxgQMTZ6CgIIwfXiF+JiUIsxklG4dcpAGY2Rx3Hxq/PpMzADZqPXv2pKSkhNKKeS4lLW3btqVnz57Zzka1ysth6VIYxRR+xzjyWcpS8vk145lqRfz+ms3w9gJazJlD///+FxIEDYCWmzeEafBGjoTddoOrroLPPquasKAAzjmn8rpUp7erCA7jxoVM5+eHNAoa0hglKoY0tVeiqirJfYlqazZscH/6afeTTw5jPF2wQ9Xqp0209MV5vSs/utqxY/JHWdXoLM0UauOQpiRZJ7yKcZ922MH9kqIVvr5D4uEsNrdsEx5dffhh9w8/DO0XanQWqSRZ4Gi2bRyS2woL4YDiylVQV/BbPm9bwG1HPU/vD5/H3nk7+QHMQl1WvClTVF0kEknWxqHAITll82Z4+ml44sdTuJsxdGBb24ET5hOmZUs48EAYMQJuuQVWrqx6oPjGaRGpIlngaJGNzIhU59Vzp1DSspBya0FJy0JePXcKANdfD/3yv+G+H/+NOzm7UtCAEDQ+b9EdvvgCZsyAyy6DP/4xNEbHStQ4LSIpy2iJw8xGALcAecBf3f36uO0FwL1Ad+AL4BR3L4m2bSHMLQ6w1N2Pjtb3BqYCXYE5wKnuvqm6fKjEkTtePXcKe99RuSSxnrZ8uu/RrFv8GX1KX6OVb95WuojjGOZxVVCqfhKplQavqjKzPOAj4DCgBHgLGOXuC2LSPAr83d3vM7NDgNPd/dRo2zfuvl2C404DnnD3qWZ2J/C2u99RXV4UOHJHSctCem5J/FisDxqMjfg+fP/7cPrpIRDEUxWUSL3JRlXVMGChuy+KSgRTgZFxafoBL0fLMxJsr8RCT71DgMeiVfcBx9RXhiW71r+3iB5JgkY5hr09D264AQ45BH73O1VBiWRJJgNHD2BZzOeSaF2st4HjouVjgY5m1jX63NbMZpvZG2Z2TLSuK7Da3cuqOSYAZjYm2n+2Ovk1YuXl+PMvwA9/SNuBeyRNtiIvv/KKoqLQq7qgIDwhVVCgXtYiDSTbjeOXAAeb2VzgYGA5sCXaVhAVkU4Gbjaz3dM5sLtPdPeh7j60e/fu9ZppqYUpU8IztC1ahPe//pV110/g8x33wo4Ygc+ejV1xBa/86C+spXJJYi3tWTImQUmiqChUS5WXh3cFDZEGkckhR5YDvWI+94zWbeXuK4hKHGa2HXC8u6+Oti2P3heZ2Uxgb+BxoLOZtYxKHVWOKY3QlCmU/XQMLTdFDd7FxfiZZ9IeeJv9uP/bD3LGcz+i045tGA68em4XCieOY9ctS1mRl8+SMeM58HYFBZHGIpON4y0JjeOHEm7ubwEnu/t7MWm6AV+4e7mZjQe2uPuVZtYFWOfuG6M0s4CR7r4galB/PKZx/B13v726vKhxPLu+6ZZ48MDPW+7MijkrGTQoC5kSkRo1eON4VCI4H3gBeB+Y5u7vmdk1ZnZ0lGw48KGZfQTsBFTUR+wFzDaztwmN5tfHPI11GfBLM1tIaPO4J1PXIPVg7lw6JBk8cIeyTxU0RHKQeo5LZrz7Llx9NTzxBFsw8qj672wJBRT6kgbPmoikRj3HJTPiG73/+Ec46SR88GA2PvMiv2t5JWdxV8IG7xu76tFZkVykwCG1VzH5UHFxGEu2uBh+9Sv8ySe5f5ex7LJxCa8d/lsG3XIm57eayBIKKMdYQgHnt5rIvreowVskFzXbiZykHowbV3mCogo77shLw3/HX0bAySeHbhZTuhYxfFyRRv0QaQLUxiG14463yMMStF0kHC9KRHKO2jik/nzyCRx/fMKgAbA8vpe3iDQpChySOnd46CHo3x+efZaHOTFho/dlW9ToLdKUKXBIalauhGOPDQ0Te+4J8+Zx9vZTOZPKjd5nMpHXCtR4IdKUKXBIVbGP2BYUwDnnhFLGCy/An/5E+b9ehb59ueUWeLJtEb1ZQh7l9GYJT7cv0gC1Ik2cAodUFv+I7dKlcOed0K0bvP02T+9xMfsekMfq1TB6NPz1rxqgVqS5UeCQypI9YrtxI7c8tyfHHhsKIpuiORc1QK1I86PAIZUlmlUP8KXLuOgiOOaYMJ33jjs2aK5EpBFR4JBtNm6sOqtepJh8fvELePTRpElEpJlQ4JDgyy/DXN5r17KJVpU2raU9Lw4fz403Ql5elvInIo2GAofAokWw//4waxY/7zaF0Uyq8ojt+MVqvBCRQGNVNXdvvglHHQVlZfDii9w6/CAceJjKgcISN32ISDOkEkdz9vjjMHw4dOwIs2bBQQfRs2fipPkaRUREIgoczUlsx74uXeBHP4IhQ+CNN+Bb36KsDDp3rrpb+/aoU5+IbKXA0VzEd+xbvTq0dI8ZA927A/DRR7BsGZxxhjr1iUhyGR1W3cxGALcAecBf3f36uO0FwL1Ad+AL4BR3LzGzIcAdwPbAFmC8uz8S7TMZOBj4KjrMaHefV10+NKw6oaRRnGDu74KC0HMvUlq6NY6ISDPX4MOqm1kecBtwBNAPGGVm/eKS/Qm4390HAdcAv4/WrwN+4u79gRHAzWbWOWa/S919SPSal6lraFKSdOxj6VKeegpuuCEURBQ0RKQmmayqGgYsdPdF7r4JmAqMjEvTD3g5Wp5Rsd3dP3L3/0XLK4DPCKUSqQ132G67hJs27pRPURE88cS2YURERKqTycDRA1gW87kkWhfrbeC4aPlYoKOZdY1NYGbDgNbAxzGrx5vZO2Z2k5m1SXRyMxtjZrPNbHZpaWldriP33XQTrFkDLSs/fV3erj2/WD+e7t1h+nRok/CbFBGpLNuN45cAB5vZXEK7xXJCmwYAZrYL8ABwuvvWuUgvB/oC/w/YAbgs0YHdfaK7D3X3od2bc/3Lo4/CxReHJ6gmTdra6l3eq4Bfd5vIFC/imWdgp52ynVERyRWZ7AC4HOgV87lntG6rqBrqOAAz2w443t1XR5+3B54Bxrn7GzH7rIwWN5rZJELwkURefRVOPRW+8x144AFo2xZOOQWAZ/4GN/8Y/va3MNWGiEiqMlnieAvoY2a9zaw1cBIwPTaBmXUzs4o8XE54wooo/ZOEhvPH4vbZJXo34BhgfgavIXd9+CGMHBlKGE8/DW3bVurGccEF8Kc/wWGHZTujIpJrMhY43L0MOB94AXgfmObu75nZNWZ2dJRsOPChmX0E7ARUdDM7ATgIGG1m86LXkGjbFDN7F3gX6AZcl6lryFmffgpHHBH6aTz3HHTtWqUbR3ExXHZZ6N4hIpKOjPbjaCyaVT+OtWvhe9+D+fNh5kwYNgxIuRuHiMhWyfpxaJDDpqSsDEaNgjlz4MkntwYNqLYbh4hIWrL9VJXUVfz4U3/7G0yYAEcfvTXJ5s3QunXi3TV4oYikSyWOXFbRcFExR/g334S+GnEjFU6bFib3a926cic/DV4oIrWhEkcuGzduW9CoUFYW1sc4+WT417/g3ns1eKGI1J1KHLmshoaLRYtCCaNvX/jud8MmBQoRqSuVOHJZNbMubdgQOosffrjGoBKR+qXAkavcYdddq66PGi4uugjmzoXbbkveMC4iUhsKHLnq1lvDfOEnnlil4WIKRdx1F/zqV2E6cRGR+qQ2jlz0xhth4MKjjoKHHgqP4kY++gjO+nZo09ATUyKSCSpx5JrPP4cTToAePeC++yoFDQj9Ms45B6ZOrTKKuohIvdCtJZeUl4fRbT/9FF5/PXT4i7iHJ3M7dIA//jGLeRSRJk8ljlwyfjy88ELoGb7PPkDljuOdO8Nf/pLVHIpIM6ASR6546SW46qpQ4hgzBqjacbysDMaOhR12UH8NEckcjY6bC0pKYO+9wzR9b74Z6qPQiLciklnJRsdVVVVjt3lzeOR2/Xp47LGtQQM04q2IZIcCR2NV0XjRunVoCB89OowdEqOajuMiIhmjwNEYxU7XV2HSpCrT9f3+96GjeCyNeCsimZbRwGFmI8zsQzNbaGZjE2wvMLN/mtk7ZjbTzHrGbDvNzP4XvU6LWb+Pmb0bHXNCNPd405Jo1Nt167aOevv11yGuHHZYGOFWI96KSEPKWOO4meUBHwGHASXAW8Aod18Qk+ZR4O/ufp+ZHQKc7u6nmtkOwGxgKODAHGAfd//SzP4D/Bx4E3gWmODuz1WXl5xrHG/RInTMiGeGbyln1KjQ3DFzJhx4YIPnTkSaiWw0jg8DFrr7InffBEwFRsal6Qe8HC3PiNn+feBFd//C3b8EXgRGmNkuwPbu/oaHiHc/cEwGr6HhlZVBmzaJt+XnM3EiPPIIXHedgoaIZEcmA0cPYFnM55JoXay3geOi5WOBjmbWtZp9e0TL1R0TADMbY2azzWx2aWlprS+iwV16KWzYUHVI2/btWTJmPBdeCCNGhAEMRUSyIaXAYWbHmlmnmM+dzeyYejj/JcDBZjYXOBhYDmyph+Pi7hPdfai7D+3evXt9HDLz7r0Xbr4Zfv7zhNP1nfHPIrp1g/vvrzJElYhIg0m15/hV7v5kxQd3X21mVwFPVbPPcqBXzOee0bqt3H0FUYnDzLYDjo+OvRwYHrfvzGj/nnHrKx0zZ736Kpx9dmjx/vOfwwiFca3c00bAsmWQK3FQRJqmVP9uTZSupqDzFtDHzHqbWWvgJGB6bAIz62ZmFce+HLg3Wn4BONzMuphZF+Bw4AV3Xwl8bWb7RU9T/QR4OsVraLyKi+G440K/jUceqTKs7axZoR9g164wZEhWcigislWqgWO2md1oZrtHrxsJTzol5e5lwPmEIPA+MM3d3zOza8zs6CjZcOBDM/sI2AkYH+37BXAtIfi8BVwTrQM4F/grsBD4GKj2iapG75tvYOTIML/r9OmVRrwFmDcPvve9rU/iiohkXapVVRcAVwCPEB6PfRE4r6ad3P1ZwiOzseuujFl+DHgsyb73sq0EErt+NjAgxXw3buXlcNpp8O678MwzlXqGT5kCl18eqqby8mC33bKYTxGRGCkFDndfC1TpwCd19NvfwhNPwI03hkelIvGj3m7ZEib869hRnftEJPtSfarqRTPrHPO5i5m9kLFcNQePPgrXXAOnnw4XXVRpUw0dx0VEsirVNo5u7r664kPUKW/HjOSoKYuddenEE6FPH7jjjvDIbQyNeisijVmqgaPczLaOuWpmhYS2DklV7MCF7uFVUhLGDols2gQPPAC9eiU+hEa9FZHGINXAMQ541cweMLMHgVcIj89KqsaOrVr/tH791vqn9evh2GPhJz8JtVca9VZEGquUAoe7P08YcPBD4GHgYmB9BvPVdCxZEtowSkoSb1+6lK+/hiOOgOeeg7vugquv1qi3ItJ4pfRUlZn9DLiQ0FN7HrAfMAs4JGM5yyVTpoSSw9KloT5p/PjQfvHnP4eqqBYtwsx9a9dW2XVLj3wOPTT013joITjppLC+qEiBQkQap1Srqi4E/h9Q7O7fA/YGVmcqU41CbEN2YWGVSZQqpYttuyguDvVN++4Lzz8Pl1wCixeHokSC+qf5o8bzwQfw1FPbgoaISGOWagfADe6+wcwwszbu/oGZfSujOcum+I4UxcVwxhkwfz4MHx46VlS8LrqoattFeXnoAb5kCWy/fVhXVMSrr0HhxHHsumUpK/LyWXLaeA78QxGLfwXdujXg9YmI1EFKEzmZ2ZPA6cBFhOqpL4FW7n5kRnNXT9KeyKmwsPK0rbVhFgJIJD4WQZh24557VCUlIo1Tsomc0p4B0MwOBjoBz0cTNDV6aQeOambg49VXwxggeXlhMMIjj4SVK6umLSgIJY6Yj4n6YcQlExFpNOptBkB3f8Xdp+dK0KiVZB0m8vPhgANC+8XQoTBkCK8e80fWUrntYi3teWTweG69dVuhQ536RKSp0HRAiYwfn3JHilOeLeJMJrKEAsoxllDAmUzkpOlFXHBBaBcH2GmnxKdSpz4RyTUKHIkUFaXckWLpUniYInqzhDzK6c0SHqYIs1CDVTGq7Z//rE59ItI0KHAkU1QUGh/Ky8N7khbs6mq1dt552zBUacQiEZFGTYGjjsaPD09HxUpWkkgxFomINGoKHHVUVATDhoWHrFSSEJHmINUOgFKNli3Dg1avvZbtnIiIZF5GSxxmNsLMPjSzhWZWZQZBM8s3sxlmNtfM3jGzI6P1RWY2L+ZVbmZDom0zo2NWbMv6vCDFxXo6SkSaj4wFDjPLA24DjgD6AaPMrF9cst8A09x9b+Ak4HYAd5/i7kPcfQhwKrDY3efF7FdUsd3dP8vUNaRiy5YwL3hBQTZzISLScDJZ4hgGLHT3RVFnwanAyLg0DkSDOdEJWJHgOKOifRul0tLwrsAhIs1FJts4egDLYj6XAPvGpbka+IeZXQB0AP4vwXFOpGrAmWRmW4DHges8wbgpZjYGGAOQn8F6pJ13hg0boKwsY6cQEWlUsv1U1Shgsrv3BI4EHjCzrXkys32Bde4+P2afIncfCHw3ep2a6MDuPtHdh7r70O7du2fuCghDW7VundFTiIg0GpkMHMuB2Nmze0brYp0BTANw91lAWyB2gPGTCDMObuXuy6P3NcBDhCqxrJk2Dc48M7R1iIg0B5kMHG8Bfcyst5m1JgSB6XFplgKHApjZXoTAURp9bgGcQEz7hpm1NLNu0XIr4IfAfLJoxgx44onQj0NEpDnIWBuHu5eZ2fnAC0AecK+7v2dm1wCz3X06Ye7yu83sF4SG8tEx7RUHAcvcfVHMYdsAL0RBIw94Cbg7U9eQiuJiNYyLSPOS0Q6A7v4s8GzcuitjlhcA30my70zC3Oax69YC+9R7RuuguBj23DPbuRARaTjZbhzPaRVTjKvEISLNiQJHHXzzDey4I+yxR7ZzIiLScDRWVR107AiLFtWcTkSkKVGJQ0RE0qLAUQcPPggjRsC6ddnOiYhIw1HgqIM5c+Df/4Z27bKdExGRhqPAUQcVT1RVTA8rItIcKHDUgR7FFZHmSIGjDhQ4RKQ5UuCopc2boX9/GDIk2zkREWlY6sdRS61awSuvZDsXIiINTyUOERFJiwJHLd13X6iqWrUq2zkREWlYChy19OGH8NFH0LlztnMiItKwFDhqqbgYevXSBE4i0vwocNSSHsUVkeZKgaOWlixR4BCR5kmBoxbc4dBD4eCDs50TEZGGl9HAYWYjzOxDM1toZmMTbM83sxlmNtfM3jGzI6P1hWa23szmRa87Y/bZx8zejY45wazhR4oyC09VnX56Q59ZRCT7MhY4zCwPuA04AugHjDKzfnHJfgNMc/e9gZOA22O2fezuQ6LX2THr7wDOBPpErxGZuoZkystDqUNEpDnKZIljGLDQ3Re5+yZgKjAyLo0D20fLnYAV1R3QzHYBtnf3N9zdgfuBY+o11ym4/37o1AmWLm3oM4uIZF8mA0cPYFnM55JoXayrgVPMrAR4FrggZlvvqArrFTP7bswxS2o4JgBmNsbMZpvZ7NLS0jpcRlXFxbBmDey0U70eVkQkJ2S7cXwUMNndewJHAg+YWQtgJZAfVWH9EnjIzLav5jhVuPtEdx/q7kO7d+9er5kuLoZddoE2ber1sCIiOSGTgxwuB3rFfO4ZrYt1BlEbhbvPMrO2QDd3/wzYGK2fY2YfA3tG+/es4ZgZpz4cItKcZbLE8RbQx8x6m1lrQuP39Lg0S4FDAcxsL6AtUGpm3aPGdcxsN0Ij+CJ3Xwl8bWb7RU9T/QR4OoPXkJACh4g0Zxkrcbh7mZmdD7wA5AH3uvt7ZnYNMNvdpwMXA3eb2S8IDeWj3d3N7CDgGjPbDJQDZ7v7F9GhzwUmA+2A56JXgyoqgr59G/qsIiKNg3kzeK506NChPnv27GxnQ0Qkp5jZHHcfGr8+243jOWf9evjqq2znQkQkexQ40jR9ehhKff78bOdERCQ7FDjSVFwc3vPzs5sPEZFsUeBIU3ExdOkC26fVq0REpOlQ4EiTHsUVkeZOgSNNChwi0txlsud4k/SLX0C3btnOhYhI9ihwpOmnP812DkREsktVVWn46qvwGO7GjdnOiYhI9ihwpGHmTBg4EN59N9s5ERHJHgWONFT04VDjuIg0ZwocaSguhnbt1DguIs2bAkcaiotDj3GzbOdERCR7FDjSoD4cIiJ6HDct48dDS31jItLM6TaYhsMPz3YORESyT1VVKfriC3j+eVi9Ots5ERHJLgWOFL31FhxxhPpwiIhkNHCY2Qgz+9DMFprZ2ATb881shpnNNbN3zOzIaP1hZjbHzN6N3g+J2WdmdMx50WvHTF5DBfXhEBEJMtbGYWZ5wG3AYUAJ8JaZTXf3BTHJfgNMc/c7zKwf8CxQCHwOHOXuK8xsAPAC0CNmvyJ3b9BJxIuLIS8Pdt21Ic8qItL4ZLLEMQxY6O6L3H0TMBUYGZfGgYopkToBKwDcfa67r4jWvwe0M7M2GcxrjYqLoWdPPVUlIpLJwNEDWBbzuYTKpQaAq4FTzKyEUNq4IMFxjgf+6+6xQwtOiqqprjBL3B3PzMaY2Wwzm11aWlrri6igPhwiIkG2G8dHAZPdvSdwJPCAmW3Nk5n1B24AzorZp8jdBwLfjV6nJjqwu09096HuPrR79+51zuhdd8FNN9X5MCIiOS+TgWM50Cvmc89oXawzgGkA7j4LaAt0AzCznsCTwE/c/eOKHdx9efS+BniIUCWWcf36wbe/3RBnEhFp3DIZON4C+phZbzNrDZwETI9LsxQ4FMDM9iIEjlIz6ww8A4x199cqEptZSzOrCCytgB8C8zN4DQCsWgUTJ0JJSabPJCLS+GUscLh7GXA+4Ymo9wlPT71nZteY2dFRsouBM83sbeBhYLS7e7TfHsCVcY/dtgFeMLN3gHmEEszdmbqGCvPnw1lnwfvvZ/pMIiKNX0afEXL3ZwmN3rHrroxZXgB8J8F+1wHXJTnsPvWZx1SoD4eIyDbZbhzPCRWBIz8/u/kQEWkMFDhSUFwMO+0EbdtmOyciItmnwJEC9eEQEdlG/aBTMHUqfP11tnMhItI4KHCkoGvX8BIREVVV1ejLL+HKK2HBgprTiog0BwocNfjf/+Daa+Hjj2tOKyLSHChw1EB9OEREKlPgqIECh4hIZQocNSguhk6dwktERBQ4arR8uUobIiKx9DhuDR5/HNasyXYuREQaD5U4amAG229fczoRkeZCgaMaX38Np58Or7+e7ZyIiDQeChzVWLwYJk8O7RwiIhIocFRDj+KKiFSlwJHElCkwenRYPv748FlERDIcOMxshJl9aGYLzWxsgu35ZjbDzOaa2TtmdmTMtsuj/T40s++nesz6MGUKjBkTxqmCMNf4mDEKHiIiABam+M7Agc3ygI+Aw4AS4C1gVDRdbEWaicBcd7/DzPoBz7p7YbT8MDAM2BV4Cdgz2q3aYyYydOhQnz17dsp5LyzcVk0Vq6AAlixJ+TAiIjnNzOa4+9D49ZkscQwDFrr7InffBEwFRsalcaDiYddOwIpoeSQw1d03uvtiYGF0vFSOWWdLl6a3XkSkOclk4OgBLIv5XBKti3U1cIqZlQDPAhfUsG8qxwTAzMaY2Wwzm11aWppWxpPNLa45x0VEst84PgqY7O49gSOBB8ysXvLk7hPdfai7D+3evXta+44fD+3bV17Xvn1YLyLS3GUycCwHesV87hmti3UGMA3A3WcBbYFu1eybyjHrrKgIJk4MbRpm4X3ixLBeRKS5y2TgeAvoY2a9zaw1cBIwPS7NUuBQADPbixA4SqN0J5lZGzPrDfQB/pPiMetFUVFoCC8vD+8KGiIiQcYGOXT3MjM7H3gByAPudff3zOwaYLa7TwcuBu42s18QGspHe3jM6z0zmwYsAMqA89x9C0CiY2bqGkREpKqMPY7bmKT7OK6IiGTncVwREWmCFDhERCQtChwiIpKWZtHGYWalQPwgIt2Az7OQnUxpatcDTe+adD2NX1O7prpeT4G7V+kI1ywCRyJmNjtRo0+uamrXA03vmnQ9jV9Tu6ZMXY+qqkREJC0KHCIikpbmHDgmZjsD9aypXQ80vWvS9TR+Te2aMnI9zbaNQ0REaqc5lzhERKQWFDhERCQtzS5wNMSc5Q3NzJaY2btmNs/Mcm5QLjO718w+M7P5Met2MLMXzex/0XuXbOYxXUmu6WozWx79TvPM7Mhs5jEdZtbLzGaY2QIze8/MLozW5+TvVM315PJv1NbM/mNmb0fX9NtofW8zezO65z0SjSxet3M1pzaOVOZBz0VmtgQY6u452XHJzA4CvgHud/cB0bo/AF+4+/VRgO/i7pdlM5/pSHJNVwPfuPufspm32jCzXYBd3P2/ZtYRmAMcA4wmB3+naq7nBHL3NzKgg7t/Y2atgFeBC4FfAk+4+1QzuxN4293vqMu5mluJo0HmLJf0uPu/gC/iVo8E7ouW7yP8p84ZSa4pZ7n7Snf/b7S8BnifMG1zTv5O1VxPzvLgm+hjq+jlwCHAY9H6evmNmlvgSHnO8hzjwD/MbI6Zjcl2ZurJTu6+Mlr+BNgpm5mpR+eb2TtRVVZOVOvEM7NCYG/gTZrA7xR3PZDDv5GZ5ZnZPOAz4EXgY2C1u5dFSerlntfcAkdTdaC7fxs4AjgvqiZpMqLJvZpCneodwO7AEGAl8Oes5qYWzGw74HHgInf/OnZbLv5OCa4np38jd9/i7kMI02oPA/pm4jzNLXA0yJzlDc3dl0fvnwFPEv7B5LpPo3roivroz7Kcnzpz90+j/9jlwN3k2O8U1Zs/Dkxx9yei1Tn7OyW6nlz/jSq4+2pgBrA/0NnMKmZ7rZd7XnMLHA02Z3lDMbMOUeMeZtYBOByYX/1eOWE6cFq0fBrwdBbzUi8qbrCRY8mh3ylqeL0HeN/db4zZlJO/U7LryfHfqLuZdY6W2xEeAnqfEEB+FCWrl9+oWT1VBRA9Xncz2+YsH5/dHNWNme1GKGVAmEP+oVy7JjN7GBhOGAL6U+Aq4ClgGpBPGBL/BHfPmcbmJNc0nFAF4sAS4KyY9oFGzcwOBP4NvAuUR6t/TWgXyLnfqZrrGUXu/kaDCI3feYRCwTR3vya6R0wFdgDmAqe4+8Y6nau5BQ4REamb5lZVJSIidaTAISIiaVHgEBGRtChwiIhIWhQ4REQkLQocIrVkZltiRlGdV5+jLZtZYezIuiKNScuak4hIEuuj4R1EmhWVOETqWTQ/yh+iOVL+Y2Z7ROsLzezlaAC9f5pZfrR+JzN7MppH4W0zOyA6VJ6Z3R3NrfCPqDcwZvbzaB6Jd8xsapYuU5oxBQ6R2msXV1V1Ysy2r9x9IHArYaQCgL8A97n7IGAKMCFaPwF4xd0HA98G3ovW9wFuc/f+wGrg+Gj9WGDv6DhnZ+bSRJJTz3GRWjKzb9x9uwTrlwCHuPuiaCC9T9y9q5l9Tpg8aHO0fqW7dzOzUqBn7DAQ0VDfL7p7n+jzZUArd7/OzJ4nTBL1FPBUzBwMIg1CJQ6RzPAky+mIHU9oC9vaJH8A3EYonbwVM/KpSINQ4BDJjBNj3mdFy68TRmQGKCIMsgfwT+Ac2DoRT6dkBzWzFkAvd58BXAZ0AqqUekQySX+piNReu2i2tQrPu3vFI7ldzOwdQqlhVLTuAmCSmV0KlAKnR+svBCaa2RmEksU5hEmEEskDHoyCiwETorkXRBqM2jhE6lnUxjHU3T/Pdl5EMkFVVSIikhaVOEREJC0qcYiISFoUOEREJC0KHCIikhYFDhERSYsCh4iIpOX/A4T0LuQBa+rCAAAAAElFTkSuQmCC\n"
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制训练曲线\n",
    "def plot_metric(df_history, metric):\n",
    "    plt.figure()\n",
    "\n",
    "    train_metrics = df_history[metric]\n",
    "    val_metrics = df_history['val_' + metric]  #\n",
    "\n",
    "    epochs = range(1, len(train_metrics) + 1)\n",
    "\n",
    "    plt.plot(epochs, train_metrics, 'bo--')\n",
    "    plt.plot(epochs, val_metrics, 'ro-')  #\n",
    "\n",
    "    plt.title('Training and validation ' + metric)\n",
    "    plt.xlabel(\"Epochs\")\n",
    "    plt.ylabel(metric)\n",
    "    plt.legend([\"train_\" + metric, 'val_' + metric])\n",
    "\n",
    "    plt.savefig(imgs_dir + metric + '.png')  # 保存图片\n",
    "    plt.show()\n",
    "\n",
    "plot_metric(df_history, 'loss')\n",
    "plot_metric(df_history, metric_name)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 5.测试"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "outputs": [],
   "source": [
    "@torch.no_grad()\n",
    "def eval_step(model, inps, tags):\n",
    "    inps = inps.to(device)\n",
    "    tags = tags.to(device)\n",
    "    mask = torch.logical_not(torch.eq(inps, torch.tensor(0)))  # =>[b, seq_len],每个元素为bool值，如果序列中有pad，则mask相应位置就为False\n",
    "    # mask = mask.type(torch.uint8)\n",
    "    # mask = mask.to(device)\n",
    "\n",
    "    # forward\n",
    "    logits = model(inps)\n",
    "    preds = model.module.crf_decode(logits, mask=mask, inp_logits=True)  # List[List]\n",
    "    pred_without_pad = []\n",
    "    for pred in preds:\n",
    "        pred_without_pad.extend(pred)\n",
    "    tags_without_pad = torch.masked_select(tags, mask).cpu()  # 返回是1维张量\n",
    "\n",
    "    return torch.tensor(pred_without_pad), tags_without_pad"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "outputs": [],
   "source": [
    "def evaluate(model, test_dloader):\n",
    "    model.eval()  # 设置eval mode\n",
    "    starttime = time.time()\n",
    "    print('*' * 27, 'start evaluating...')\n",
    "    printbar()\n",
    "    preds, labels = [], []\n",
    "    for step, (inps, tags) in enumerate(tqdm(test_dloader), start=1):\n",
    "        pred, tags = eval_step(model, inps, tags)\n",
    "        preds.append(pred)\n",
    "        labels.append(tags)\n",
    "\n",
    "    y_true = torch.cat(labels, dim=0)\n",
    "    y_pred = torch.cat(preds, dim=0)\n",
    "    endtime = time.time()\n",
    "    print('evaluating costs: {:.2f}s'.format(endtime - starttime))\n",
    "    return y_true.cpu(), y_pred.cpu()"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "outputs": [],
   "source": [
    "def get_metrics(y_true, y_pred):\n",
    "    average = 'weighted'\n",
    "    print('*'*27, average+'_precision_score:{:.3f}'.format(precision_score(y_true, y_pred, average=average)))\n",
    "    print('*'*27, average+'_recall_score:{:.3}'.format(recall_score(y_true, y_pred, average=average)))\n",
    "    print('*'*27, average+'_f1_score:{:.3f}'.format(f1_score(y_true, y_pred, average=average)))\n",
    "\n",
    "    print('*'*27, 'accuracy:{:.3f}'.format(accuracy_score(y_true, y_pred)))\n",
    "    print('*'*27, 'confusion_matrix:\\n', confusion_matrix(y_true, y_pred))\n",
    "    print('*'*27, 'classification_report:\\n', classification_report(y_true, y_pred))"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 开始测试"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "vocab size: 4314\n",
      "*************************** Loading model weights...\n",
      "*************************** Model loaded success!\n",
      "*************************** start evaluating...\n",
      "\n",
      "================================================================================2021-02_10 16:03:36\n",
      "evaluating costs: 5.12s\n",
      "*************************** weighted_precision_score:0.969\n",
      "*************************** weighted_recall_score:0.97\n",
      "*************************** weighted_f1_score:0.969\n",
      "*************************** accuracy:0.970\n",
      "*************************** confusion_matrix:\n",
      " [[166169    168    277     53    158    138    797]\n",
      " [   353   2533    110     39      3    124     18]\n",
      " [   540     50   3481      4     66      1    211]\n",
      " [   193     40      5   1414     16      6      5]\n",
      " [   328     12     61     12   2799      0     13]\n",
      " [   318    129      8     16      1   1464     83]\n",
      " [  1091     11    234      4     36     50   6540]]\n",
      "*************************** classification_report:\n",
      "               precision    recall  f1-score   support\n",
      "\n",
      "           1       0.98      0.99      0.99    167760\n",
      "           2       0.86      0.80      0.83      3180\n",
      "           3       0.83      0.80      0.82      4353\n",
      "           4       0.92      0.84      0.88      1679\n",
      "           5       0.91      0.87      0.89      3225\n",
      "           6       0.82      0.73      0.77      2019\n",
      "           7       0.85      0.82      0.84      7966\n",
      "\n",
      "    accuracy                           0.97    190182\n",
      "   macro avg       0.88      0.83      0.86    190182\n",
      "weighted avg       0.97      0.97      0.97    190182\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 10/10 [00:05<00:00,  1.96it/s]\n"
     ]
    }
   ],
   "source": [
    "print('vocab size:', len(word_to_id))\n",
    "\n",
    "checkpoint = save_dir + 'epoch030_valacc0.971_ckpt.tar'\n",
    "\n",
    "# 加载测试数据\n",
    "test_dloader = load_data(data_base_dir + 'test.txt', word_to_id)\n",
    "\n",
    "# 加载模型\n",
    "reloaded_model = BiLSTM_CRF(len(word_to_id), hidden_size)\n",
    "reloaded_model = reloaded_model.to(device)\n",
    "if ngpu > 1:\n",
    "    reloaded_model = torch.nn.DataParallel(reloaded_model, device_ids=list(range(ngpu)))  # 设置并行执行\n",
    "\n",
    "print('*' * 27, 'Loading model weights...')\n",
    "# ckpt = torch.load(checkpoint, map_location=device)  # dict  save在CPU 加载到GPU\n",
    "ckpt = torch.load(checkpoint)  # dict  save在GPU 加载到 GPU\n",
    "model_sd = ckpt['net']\n",
    "if device.type == 'cuda' and ngpu > 1:\n",
    "    reloaded_model.module.load_state_dict(model_sd)\n",
    "else:\n",
    "    reloaded_model.load_state_dict(model_sd)\n",
    "print('*' * 27, 'Model loaded success!')\n",
    "\n",
    "y_true, y_pred = evaluate(reloaded_model, test_dloader)\n",
    "get_metrics(y_true, y_pred)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 6.预测"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "source": [
    "def predict(model, sentence, word_to_id):\n",
    "    inp_ids = [word_to_id[w] if w in word_to_id else unk_id for w in sentence]\n",
    "    inp_ids = torch.tensor(inp_ids, dtype=torch.long).unsqueeze(dim=0)\n",
    "    # print(inp_ids.shape)  # [56, 60]\n",
    "    # forward\n",
    "    logits = model(inp_ids)\n",
    "    preds = model.crf_decode(logits, inp_logits=True)  # List[List]\n",
    "    pred_ids = preds[0]\n",
    "    pred_tags = [id_to_tag[tag_id] for tag_id in pred_ids]\n",
    "\n",
    "    return pred_ids, pred_tags"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%"
    }
   },
   "execution_count": 36,
   "outputs": []
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "outputs": [],
   "source": [
    "def get_entity(pred_tags, pred_ids, sentence):\n",
    "    ner = {'per':[], 'loc':[], 'org':[]}\n",
    "    i = 0\n",
    "    while i<len(pred_tags):\n",
    "        if pred_tags[i]=='O' or pred_ids[i]==0:\n",
    "            i += 1\n",
    "        elif pred_tags[i]=='B-PER':\n",
    "            j = i\n",
    "            while j+1<len(pred_tags) and pred_tags[j+1]=='I-PER':\n",
    "                j += 1\n",
    "            #print('**********************', i, j)\n",
    "            per = [w for w in sentence[i:j+1]]\n",
    "            ner['per'].append(''.join(per))\n",
    "            i = j+1\n",
    "        elif pred_tags[i]=='B-LOC':\n",
    "            j = i\n",
    "            while j+1<len(pred_tags) and pred_tags[j+1]=='I-LOC':\n",
    "                j += 1\n",
    "            #print('**********************', i, j)\n",
    "            loc = [w for w in sentence[i:j+1]]\n",
    "            ner['loc'].append(''.join(loc))\n",
    "            i = j+1\n",
    "        elif pred_tags[i]=='B-ORG':\n",
    "            j = i\n",
    "            while j+1<len(pred_tags) and pred_tags[j+1]=='I-ORG':\n",
    "                j += 1\n",
    "            #print('**********************', i, j)\n",
    "            org = [w for w in sentence[i:j+1]]\n",
    "            ner['org'].append(''.join(org))\n",
    "            i = j+1\n",
    "        else:\n",
    "            i += 1\n",
    "    return ner"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### 开始预测"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "*************************** Loading model weights...\n",
      "*************************** Model loaded success!\n",
      "********** sentence: 日本知名学者石川一成先生曾撰文说：面对宝顶大佛湾造像，看中华民族囊括外来文化的能力和创造能力，不禁使我目瞪口呆。\n",
      "********** pred_ner: {'per': ['石川一成'], 'loc': ['日本', '中华'], 'org': []} \n",
      "\n",
      "********** sentence: 5月12日，北京市怀柔县民政局、畜牧局领导来到驻守在偏远山区的武警北京一总队十支队十四中队。\n",
      "********** pred_ner: {'per': [], 'loc': [], 'org': ['北京市怀柔县民政局', '畜牧局', '武警北京一总队十支队十四中队']} \n",
      "\n",
      "********** sentence: 粉碎“四人帮”后，我家中的长辈们开始和溥杰先生恢复了联系。\n",
      "********** pred_ner: {'per': ['溥杰'], 'loc': [], 'org': []} \n",
      "\n",
      "********** sentence: 到了宋代，河西走廊为西夏所有，敦煌废弃，随着海上通商的兴旺，丝绸之路也就日渐衰落。\n",
      "********** pred_ner: {'per': [], 'loc': ['河西走廊', '西夏', '敦煌'], 'org': []} \n",
      "\n",
      "********** sentence: 丁丑盛夏之日，我冒着关内难挨的酷暑来到狼牙山下。\n",
      "********** pred_ner: {'per': ['丁丑盛夏'], 'loc': ['狼牙山'], 'org': []} \n",
      "\n",
      "********** sentence: 金田集团向县政府递交了请求兼并蔡塘村的“工程建议书”报告，很快，报告就得到批准。\n",
      "********** pred_ner: {'per': [], 'loc': ['蔡塘村'], 'org': ['金田集团']} \n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 加载模型\n",
    "reloaded_model = BiLSTM_CRF(len(word_to_id), hidden_size)\n",
    "print('*' * 27, 'Loading model weights...')\n",
    "# ckpt = torch.load(checkpoint, map_location=device)  # dict  save在CPU 加载到GPU\n",
    "ckpt = torch.load(checkpoint)  # dict  save在GPU 加载到 GPU\n",
    "model_sd = ckpt['net']\n",
    "reloaded_model.load_state_dict(model_sd)\n",
    "print('*' * 27, 'Model loaded success!')\n",
    "\n",
    "reloaded_model.eval()  # 设置eval mode\n",
    "\n",
    "sentences = [\n",
    "        '日本知名学者石川一成先生曾撰文说：面对宝顶大佛湾造像，看中华民族囊括外来文化的能力和创造能力，不禁使我目瞪口呆。',\n",
    "        '5月12日，北京市怀柔县民政局、畜牧局领导来到驻守在偏远山区的武警北京一总队十支队十四中队。',\n",
    "        '粉碎“四人帮”后，我家中的长辈们开始和溥杰先生恢复了联系。',\n",
    "        '到了宋代，河西走廊为西夏所有，敦煌废弃，随着海上通商的兴旺，丝绸之路也就日渐衰落。',\n",
    "        '丁丑盛夏之日，我冒着关内难挨的酷暑来到狼牙山下。',\n",
    "        '金田集团向县政府递交了请求兼并蔡塘村的“工程建议书”报告，很快，报告就得到批准。'\n",
    "]\n",
    "\n",
    "for sentence in sentences:\n",
    "    pred_ids, pred_tags = predict(reloaded_model, sentence, word_to_id)\n",
    "    pred_ner = get_entity(pred_tags, pred_ids, sentence)  # 抽取实体\n",
    "    print('*' * 10, 'sentence:', sentence)\n",
    "    print('*' * 10, 'pred_ner:', pred_ner, '\\n')\n"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}